Examworthyexamworthy.com

NVIDIA-Certified Associate: Generative AI LLMs cheat sheet

NVIDIA

Exam version 2026Reviewed 2026-05-30

Free to share. Examworthy is not affiliated with or endorsed by NVIDIA; NCA-GENL and related marks belong to their respective owners.

At a glance

50 to 60
Questions
60 min
Time allowed
$125
Cost (USD)

Format: Multiple choice, online proctored

Domain weight map

Heaviest first - spend your time here
Core Machine Learning and AI Knowledge30% · 93 Q
Software Development24% · 83 Q
Experimentation22% · 77 Q
Data Analysis and Visualization14% · 70 Q
Trustworthy AI10% · 33 Q

How this exam thinks

NCA-GENL is broad and conceptual: reason about LLM techniques from the transformer to serving a model responsibly, not deep maths.

Spot the trap

Tempting wrong answers, and why they fail

Tempting but wrong

Multi-head attention reduces the model's parameter count by splitting the attention matrix into independent segments.

Why it fails

Splitting sounds like compression, but multi-head attention does not reduce parameters. It uses separate learned projection matrices for each head, which typically keeps or increases the parameter count relative to a single wide attention layer.

Core Machine Learning and AI Knowledge

Tempting but wrong

Replacing feed-forward layers with mixture-of-experts (MoE) blocks fixes the quadratic self-attention bottleneck.

Why it fails

MoE cuts feed-forward cost by routing tokens to specialist sub-networks, but the self-attention bottleneck is a separate mechanism. MoE does not change how attention scores are computed across the sequence length, so it leaves the O(n^2) scaling intact.

Software Development

Tempting but wrong

Increasing batch size substantially fixes GPU idle time between batches by amortising transfer overhead.

Why it fails

A larger batch size reduces the number of host-to-device transfers per epoch but introduces no prefetching. The root cause is that data loading is sequential and blocking, which batch size does not change. It may also cause out-of-memory errors or destabilise training.

Experimentation

Tempting but wrong

Randomly shuffling sentence order within each document is a safe, label-preserving augmentation for text classification.

Why it fails

Shuffling sentences creates new orderings but disrupts coherence and often changes meaning enough to corrupt the original label, making it unreliable for classification. A meaning-preserving method like synonym replacement keeps the label intact instead.

Data Analysis and Visualization

Tempting but wrong

Using retrieval-augmented generation so the model draws only from pre-approved documents removes any need for output scanning of sensitive content.

Why it fails

RAG limits the knowledge base used during generation and reduces hallucination, but it does not guarantee the model will never produce sensitive content or regulatory advice. A model can still compose harmful output from retrieved context, so RAG alone is insufficient for output-stage enforcement.

Trustworthy AI

Tempting but wrong

Transformers process sequences left-to-right one token at a time, giving better gradient flow than RNNs.

Why it fails

Transformers process the entire sequence in parallel, not sequentially. Claiming they are sequential conflates the architecture with autoregressive generation, and bidirectional RNNs already improve gradient flow over unidirectional ones.

Core Machine Learning and AI Knowledge

Tempting but wrong

All model inputs must be cast to float32 at ingestion because networks require one unified numeric precision throughout.

Why it fails

Mixed precision is standard. Token IDs stay as integer tensors until an embedding lookup converts them to float. Casting token IDs directly to float32 before embedding is semantically incorrect and bypasses the embedding table entirely.

Software Development

Tempting but wrong

BLEU-4 against the source document is the right primary metric for abstractive summarisation.

Why it fails

BLEU was designed for machine translation and emphasises precision of short n-grams against a reference translation, not a lengthy source document. Comparing against the source rather than a reference summary measures compression, not fidelity to human-preferred content.

Experimentation

Key terms

Deep learningTrainingTransformersLLMsNLPBERTMegatronSelf-supervisionXGBoostGraph algorithmsTransfer learningTriton Inference ServerLangChainData augmentationcuDFDask

Exam-day rules

  • Read the last line of the question first. It tells you what is actually being asked, so you can read the scenario looking for the answer rather than memorising detail.
  • Choose the most appropriate option, not merely a correct one. Several options are often true; the exam wants the best fit for the stated requirement.
  • When a scenario names a tool problem, match it to the NVIDIA tool: serving and batching point to Triton, large GPU dataframes point to cuDF, scaling work points to Dask.
  • Watch for absolutes such as always, never, and guarantees. In generative AI scenarios they are usually the wrong answer because models are probabilistic.
  • Flag and move on. Do not lose time on one hard item when easier marks are waiting; with 60 minutes for the paper, covering every question first matters.

Revision schedule

  1. Day 1
    Map the blueprint and set a date
  2. Week 1
    Lock the core ML and AI knowledge
  3. Weeks 1-2
    Learn the build and serve path
  4. Weeks 2-3
    Practise experimentation and data preparation
  5. Week 4
    Cover trustworthy AI

Practise NCA-GENL free

Every question has a worked explanation and a per-distractor rationale. No sign-up.

638 audited flashcards in this deck.

Practise NCA-GENL free
Examworthy - NVIDIA-Certified Associate: Generative AI LLMs (NCA-GENL) cheat sheet. Free to share.examworthy.com