NVIDIA-Certified Associate: Accelerated Data Science cheat sheet
NVIDIA
Free to share. Examworthy is not affiliated with or endorsed by NVIDIA; NCA-ADS and related marks belong to their respective owners.
At a glance
Format: Multiple choice, online proctored
Domain weight map
Heaviest first - spend your time hereHow this exam thinks
NCA-ADS tests whether you can run a data science workflow on the GPU with RAPIDS, and know why acceleration helps and when.
Spot the trap
Tempting wrong answers, and why they failTempting but wrong
Activating the cudf.pandas accelerator automatically transfers pre-existing pandas DataFrames to the GPU without any conversion call.
Why it fails
The cudf.pandas accelerator intercepts pandas API calls on objects created after the module is activated. A pandas DataFrame that already exists in memory before activation is not retroactively moved to GPU without an explicit conversion step.
Data Manipulation and Preparation
Tempting but wrong
Stratified k-fold helps with imbalance by not shuffling rows, preserving temporal sequence integrity that standard k-fold corrupts.
Why it fails
Shuffling and temporal ordering are concerns for time-series problems, not the class-imbalance problem. Stratified k-fold still shuffles rows, but does so within each class stratum, so it does not address a time-ordering concern. Its benefit for imbalance is preserving the class ratio in each fold.
Machine Learning With RAPIDS
Tempting but wrong
Printing a dask_cudf DataFrame forces Dask to resolve all partitions so the repr can display, reliably triggering full computation.
Why it fails
Printing a dask_cudf DataFrame displays a high-level metadata repr without executing the full task graph. Only a subset of metadata may be resolved, so it does not reliably trigger complete distributed computation across workers.
Data Science Pipelines and Workflow Automation
Tempting but wrong
The arithmetic mean is the best summary of typical salary because it incorporates every value in the dataset.
Why it fails
Using all data points is tempting, but outliers pull the mean far above most employees' salaries, making it unrepresentative of the typical worker. The median is preferred for skewed, outlier-affected data.
Descriptive Analysis and Visualization
Tempting but wrong
Is a small-data cuDF group-by slow because the GPU's lower per-core clock speed makes any single-threaded aggregation step slower?
Why it fails
No. Although GPUs have lower per-core clock speeds, cuDF group-by operations are data-parallel and use thousands of CUDA cores simultaneously, so raw clock comparison is not the explanation. The real cause is fixed host-device transfer overhead dominating when the dataset is small.
Foundations of Accelerated Data Science
Tempting but wrong
Logging only the final validation loss per run keeps the tracker lightweight while still supporting reproducibility.
Why it fails
No. Recording only the final loss omits the information needed to identify which run produced a given model. Without linking the run record to stored artefacts, a colleague cannot determine the code version, dataset version, or hyperparameters that generated the result.
Introductory MLOps Practices
Tempting but wrong
GPUs accelerate BFS edge inspection because they have higher single-core clock speeds than CPUs.
Why it fails
GPUs typically run at lower clock speeds than modern CPUs. Their advantage comes from massively parallel execution across thousands of cores expanding a frontier simultaneously, not from faster individual clock cycles.
Advanced Data Structures
Tempting but wrong
Rebuilding from pinned versions and a lock file will fetch the latest compatible releases so you get bug fixes published after the original pin.
Why it fails
Pinning exact versions prevents any upgrade, so the latest releases are not fetched. Fetching newer compatible releases describes an unpinned or range-constrained environment, not a pinned one with a lock file.
Software and Environment Management
Key terms
Exam-day rules
- Read the last line of the question first. It tells you what is actually being asked, so you can read the scenario looking for the answer rather than memorising every detail.
- Choose the most appropriate option, not merely a correct one. Several options often describe a real library or metric; the exam wants the best fit for the stated constraint.
- When data or compute outgrows a single GPU, think Dask. A surprising number of pipeline and scaling questions resolve on this one distinction.
- Watch for absolutes such as always, never, and guaranteed faster. The GPU is not always faster, because small jobs can lose to transfer overhead, so absolute claims are usually the wrong answer.
- Flag and move on. With 60 minutes and 50 to 60 questions, do not sink time into one hard item while easier marks wait; cover every question first, then return.
Revision schedule
- Day 1Map the blueprint and book a date
- Week 1Shore up the foundations first
- Weeks 1-2Go deep on data preparation and RAPIDS machine learning
- Weeks 2-3Cover pipelines, scaling, and analysis
- Week 4Cover MLOps, advanced structures, and environments