NVIDIA-Certified Associate: Generative AI LLMs cheat sheet
NVIDIA
Free to share. Examworthy is not affiliated with or endorsed by NVIDIA; NCA-GENL and related marks belong to their respective owners.
At a glance
Format: Multiple choice, online proctored
Domain weight map
Heaviest first - spend your time hereHow this exam thinks
NCA-GENL is broad and conceptual: reason about LLM techniques from the transformer to serving a model responsibly, not deep maths.
Spot the trap
Tempting wrong answers, and why they failTempting but wrong
Multi-head attention reduces the model's parameter count by splitting the attention matrix into independent segments.
Why it fails
Splitting sounds like compression, but multi-head attention does not reduce parameters. It uses separate learned projection matrices for each head, which typically keeps or increases the parameter count relative to a single wide attention layer.
Core Machine Learning and AI Knowledge
Tempting but wrong
Replacing feed-forward layers with mixture-of-experts (MoE) blocks fixes the quadratic self-attention bottleneck.
Why it fails
MoE cuts feed-forward cost by routing tokens to specialist sub-networks, but the self-attention bottleneck is a separate mechanism. MoE does not change how attention scores are computed across the sequence length, so it leaves the O(n^2) scaling intact.
Software Development
Tempting but wrong
Increasing batch size substantially fixes GPU idle time between batches by amortising transfer overhead.
Why it fails
A larger batch size reduces the number of host-to-device transfers per epoch but introduces no prefetching. The root cause is that data loading is sequential and blocking, which batch size does not change. It may also cause out-of-memory errors or destabilise training.
Experimentation
Tempting but wrong
Randomly shuffling sentence order within each document is a safe, label-preserving augmentation for text classification.
Why it fails
Shuffling sentences creates new orderings but disrupts coherence and often changes meaning enough to corrupt the original label, making it unreliable for classification. A meaning-preserving method like synonym replacement keeps the label intact instead.
Data Analysis and Visualization
Tempting but wrong
Using retrieval-augmented generation so the model draws only from pre-approved documents removes any need for output scanning of sensitive content.
Why it fails
RAG limits the knowledge base used during generation and reduces hallucination, but it does not guarantee the model will never produce sensitive content or regulatory advice. A model can still compose harmful output from retrieved context, so RAG alone is insufficient for output-stage enforcement.
Trustworthy AI
Tempting but wrong
Transformers process sequences left-to-right one token at a time, giving better gradient flow than RNNs.
Why it fails
Transformers process the entire sequence in parallel, not sequentially. Claiming they are sequential conflates the architecture with autoregressive generation, and bidirectional RNNs already improve gradient flow over unidirectional ones.
Core Machine Learning and AI Knowledge
Tempting but wrong
All model inputs must be cast to float32 at ingestion because networks require one unified numeric precision throughout.
Why it fails
Mixed precision is standard. Token IDs stay as integer tensors until an embedding lookup converts them to float. Casting token IDs directly to float32 before embedding is semantically incorrect and bypasses the embedding table entirely.
Software Development
Tempting but wrong
BLEU-4 against the source document is the right primary metric for abstractive summarisation.
Why it fails
BLEU was designed for machine translation and emphasises precision of short n-grams against a reference translation, not a lengthy source document. Comparing against the source rather than a reference summary measures compression, not fidelity to human-preferred content.
Experimentation
Key terms
Exam-day rules
- Read the last line of the question first. It tells you what is actually being asked, so you can read the scenario looking for the answer rather than memorising detail.
- Choose the most appropriate option, not merely a correct one. Several options are often true; the exam wants the best fit for the stated requirement.
- When a scenario names a tool problem, match it to the NVIDIA tool: serving and batching point to Triton, large GPU dataframes point to cuDF, scaling work points to Dask.
- Watch for absolutes such as always, never, and guarantees. In generative AI scenarios they are usually the wrong answer because models are probabilistic.
- Flag and move on. Do not lose time on one hard item when easier marks are waiting; with 60 minutes for the paper, covering every question first matters.
Revision schedule
- Day 1Map the blueprint and set a date
- Week 1Lock the core ML and AI knowledge
- Weeks 1-2Learn the build and serve path
- Weeks 2-3Practise experimentation and data preparation
- Week 4Cover trustworthy AI