NVIDIA-Certified Associate: Generative AI Multimodal cheat sheet
NVIDIA
Free to share. Examworthy is not affiliated with or endorsed by NVIDIA; NCA-GENM and related marks belong to their respective owners.
At a glance
Format: Multiple choice, online proctored
Domain weight map
Heaviest first - spend your time hereHow this exam thinks
NCA-GENM tests reasoning about generative AI across text, image, and audio together, and the tooling that builds and serves it.
Spot the trap
Tempting wrong answers, and why they failTempting but wrong
Self-attention inside the U-Net residual blocks injects the CLIP text embedding into the spatial features.
Why it fails
Self-attention relates spatial positions to each other within the feature map, not to an external conditioning signal such as a text embedding, so it cannot directly inject the CLIP representation. Cross-attention is the mechanism that brings token-level text into the spatial features.
Experimentation
Tempting but wrong
Dividing pixel values by the maximum gives zero-mean, unit-variance standardised inputs for each channel.
Why it fails
Dividing by the maximum value scales to the unit interval but does not produce zero-mean output. Standardisation requires subtracting the dataset mean and dividing by its standard deviation, which is a different operation from scaling to 0 to 1.
Core Machine Learning and AI Knowledge
Tempting but wrong
The vocoder controls pronunciation fidelity of technical terms in ASR, so target it to fix recognition errors.
Why it fails
Vocoders are a TTS component that converts mel spectrograms into audio waveforms; they have no role in ASR decoding or vocabulary coverage. The language model is what ranks word-sequence hypotheses during recognition.
Multimodal Data
Tempting but wrong
An AI Blueprint packages all AI logic as a single compiled binary, so you swap the embedding model by recompiling with a different library flag.
Why it fails
Blueprints are not compiled monoliths; that misrepresents the composable microservice architecture and confuses it with traditional software packaging. They wire together discrete NIM microservices, so the embedding NIM is replaced as an independent unit, not via recompilation.
Software Development
Tempting but wrong
Few-shot prompting with plain-text examples separated by delimiters is enough to guarantee a consistent machine-parseable output format.
Why it fails
Few-shot examples guide style but do not enforce a schema. Without an explicit format constraint the model may vary its output structure across samples, breaking automated parsing. Use structured-output prompting with a declared schema instead.
Data Analysis and Visualization
Tempting but wrong
Can you set only resources.requests to nvidia.com/gpu: 1 and leave limits unset so the pod bursts beyond one GPU?
Why it fails
No. Extended resources do not support burst behaviour. The scheduler requires limits to equal requests for any extended resource; specifying a request without a matching limit causes the pod to be rejected at admission.
Performance Optimization
Tempting but wrong
EXIF metadata embedded in an image header is a provenance standard that can prove a file's origin and edit history.
Why it fails
EXIF stores camera and device data but carries no cryptographic signatures, so it can be stripped or forged without detection. It does not constitute a provenance standard and cannot provide tamper-evident origin or edit history the way C2PA Content Credentials do.
Trustworthy AI
Tempting but wrong
Increasing the classifier-free guidance scale targets and removes specific undesired features.
Why it fails
Raising the guidance scale strengthens adherence to the positive prompt but does not encode or target specific undesired features. At very high values it can actually increase oversaturation and other artefacts. A negative prompt is what targets specific attributes.
Experimentation
Key terms
Exam-day rules
- Read the last line of the question first. It tells you what is actually being asked, so you can read the scenario looking for the answer rather than memorising detail.
- Choose the most appropriate option, not merely a correct one. Several options are often true; the exam wants the best fit for the stated requirement.
- Watch for absolutes such as always, never, and guarantees. In generative AI scenarios they are usually the wrong answer because the models are probabilistic.
- Flag and move on. With 50 to 60 questions in 60 minutes, roughly a minute each, do not lose time on one hard item when easier marks are waiting.
- Keep the fusion types and the orchestration distinction straight. Early, late, and intermediate fusion, and modality versus agent orchestration, are exactly the pairs distractors blur.
Revision schedule
- Day 1Map the blueprint and set a date
- Week 1Build the conceptual core (Experimentation and Core ML)
- Weeks 1-2Layer on multimodal handling
- Weeks 2-3Cover the build and deploy side
- Week 4Sweep the lower-weight domains