NVIDIA-Certified Associate: AI Infrastructure and Operations cheat sheet
NVIDIA
Free to share. Examworthy is not affiliated with or endorsed by NVIDIA; NCA-AIIO and related marks belong to their respective owners.
At a glance
Format: Multiple choice, online proctored
Domain weight map
Heaviest first - spend your time hereHow this exam thinks
NCA-AIIO checks whether you can reason about the hardware, software, networking, and operations under AI workloads, not write models.
Spot the trap
Tempting wrong answers, and why they failTempting but wrong
If a model exceeds GPU memory, you can replace GPUs with DPUs to offload memory management to a dedicated data-processing unit.
Why it fails
DPUs handle networking, storage, and security offload from the CPU. They have no general-purpose tensor compute memory that substitutes for GPU VRAM in training, so they do not expand the GPU memory pool the training process needs.
AI Infrastructure
Tempting but wrong
Wider adoption of NVLink interconnects inside GPU servers is what lets teams without infrastructure access large model training.
Why it fails
NVLink is a high-bandwidth GPU-to-GPU interconnect that improves multi-GPU throughput within a node. It is a hardware feature inside a server, not something that reduces entry barriers for teams with no on-premises GPUs at all.
Essential AI Knowledge
Tempting but wrong
Fair-share scheduling, which divides GPU resources proportionally among users, will prevent a multi-node job from failing due to partial node allocation.
Why it fails
Fair-share scheduling governs resource equity across users over time, but it does not guarantee that all nodes for a single job are reserved simultaneously, so the same partial-allocation race can still occur. Gang scheduling is the mechanism that co-allocates all nodes at once.
AI Operations
Tempting but wrong
GPU idle time between mini-batches is fundamentally caused by too few CPU data-loader worker processes.
Why it fails
Insufficient data-loader workers can contribute to starvation, but that is a software-configuration issue, not an infrastructure layer. In most large-scale cases the root cause is storage throughput, which no amount of worker tuning can overcome if the storage fabric itself is the limit.
AI Infrastructure
Tempting but wrong
Inference serving is the lifecycle stage that immediately follows data preparation.
Why it fails
Inference serving comes after a model is trained and registered. Running a serving layer on unprepared, untrained weights produces no useful predictions, so it is the wrong stage to follow data preparation; training comes first.
Essential AI Knowledge
Tempting but wrong
Slurm partition time limits, which cap a job's maximum wall-clock duration, prevent users from monopolising the cluster over a rolling period.
Why it fails
Partition time limits bound how long a single job may run, which can indirectly reduce monopolisation, but they do not track cumulative usage across users over a rolling period. Fair-share scheduling with decay-based usage accounting tracks that cumulative consumption.
AI Operations
Tempting but wrong
When model weights exceed one GPU, scale out across many nodes via InfiniBand using data parallelism to replicate the full model on every node.
Why it fails
Data parallelism replicates the entire model on each accelerator, so it cannot address a model that exceeds single-GPU memory capacity. Scale-out alone does not solve the memory constraint.
AI Infrastructure
Tempting but wrong
A model registry is the right tool for logging and comparing every experimental training run because it stores artefacts and reproduction metadata.
Why it fails
A model registry manages the promotion lifecycle of validated models, not high-frequency per-run logging. It typically receives only a curated subset of runs after evaluation, so it is not designed for comparing every iteration; that is the experiment tracker's job.
Essential AI Knowledge
Key terms
Exam-day rules
- Read the final sentence of the question first. It states what is actually being asked, so you can read the scenario hunting for the answer instead of memorising every detail.
- Choose the most appropriate option, not merely a correct one. Several options are often technically true; the exam wants the best fit for the stated constraint.
- Watch the clock: fifty questions in sixty minutes is a little over a minute each, so do not stall on one hard item while easier marks are waiting.
- Flag and move on. Cover every question first, then return to the flagged ones with whatever time is left rather than burning it early.
- When a scenario stresses many GPUs cooperating on one job, think high-speed, low-latency interconnect and offload before generic networking answers.
Revision schedule
- Day 1Map the published topics and book a date
- Week 1Lock the AI fundamentals and the NVIDIA stack
- Weeks 1-2Go deep on AI infrastructure
- Weeks 2-3Cover AI operations
- Week 4Practise on scenario questions and read every explanation