How to pass AWS Certified AI Practitioner (AIF-C01)
16 min read5 domains coveredFree practice, no sign-up
The AWS Certified AI Practitioner (AIF-C01) is a foundational exam. It tests whether you can reason about artificial intelligence, machine learning, and generative AI at a conceptual level, and judge where each is the right tool, rather than whether you can build or train models. There is no coding and no console clicking. Most questions are short scenarios that ask for the best choice among plausible options.
It suits people who work around AI without being ML engineers: developers, analysts, product and project managers, and technical sales. If you can already explain what a foundation model is and when retrieval beats fine-tuning, much of the exam will feel familiar. If those terms are new, the gap is closable in a few focused weeks because the blueprint is broad rather than deep.
The exam rewards judgement. Many options are technically true but wrong for the scenario, so the skill being tested is choosing the most appropriate answer, not recalling a definition. Practise on scenario questions with worked explanations so you learn why the other options fail, not just which letter is correct.
AIF-C01 rewards judgement about where AI, ML, and generative AI each fit, not the ability to build or train a model.
Difficulty
Foundational
Best for
Newcomers to AI and the cloud, plus business, sales, and non-technical staff who need a credible grounding in AI and the AWS AI services.
Prerequisites
None. A little familiarity with cloud concepts helps but is not assumed.
65
Questions
90 min
Time allowed
700 / 1000
Pass mark
$100
Exam cost (USD)
295
Practice questions
How this exam thinks
Three habits separate a pass from a fail on the AIF-C01, and none of them is about knowing more facts.
First, the exam asks for the most appropriate option, not a correct one. The questions are short scenarios, and several answers will be true statements about AI. Only one fits the situation as written. The real test is matching a capability to a need: what is this technology, and why would you reach for it here rather than the alternative. Read the requirement in the last line first, then judge each option against that requirement, not against general truth. A statement that is accurate in the abstract is still wrong if it does not answer what the scenario asked.
Second, the exam keeps returning to one architectural decision: retrieval versus fine-tuning. When a source of knowledge changes often, the answer is retrieval-augmented generation and a re-index, because retrieval grounds the model in current documents without retraining. Fine-tuning bakes knowledge and behaviour into the weights and suits stable, specialised needs, so a frequently changing source is the signal that fine-tuning is the wrong choice. This single distinction resolves a surprising number of questions, so make it automatic rather than reasoning it out each time.
Third, the exam expects you to remember that these models are probabilistic, which makes absolutes suspect. Options that promise a guaranteed, always-correct, or never-wrong result are usually the distractor, because a language model predicts likely text rather than retrieving verified fact. The same instinct covers hallucination, bias, and the need for human review and evaluation: when an answer claims certainty from a system that cannot offer it, distrust the answer. When two options look right, pick the measured one that grounds, evaluates, or reviews rather than the one that promises perfection.
What each domain tests and how to study it
The AIF-C01 blueprint is split across 5 domains. Weights are the official share of the exam; see the official exam guide for the authoritative breakdown.
What you must be able to do. Place AI, ML, and deep learning in relation to one another, match a business problem to the right capability, and walk the ML lifecycle as a loop that includes monitoring and retraining.
In one sentenceThe shared vocabulary the rest of the exam assumes: what AI, ML, and deep learning are, the three learning styles, and the lifecycle from data to deployment to monitoring.
Recall check: answer these from memory first
Place AI, machine learning, and deep learning in relation to one another in one sentence.
Give a one-line distinction between supervised, unsupervised, and reinforcement learning, with an example use case for each.
Name the stages of the ML lifecycle in order, and say why monitoring and retraining belong in it rather than after it.
What it tests. The vocabulary and mental model: what AI, machine learning, and deep learning are and how they relate, the difference between supervised, unsupervised, and reinforcement learning, and the stages of the ML lifecycle from data preparation through training, evaluation, deployment, and monitoring. It also checks that you can match a business problem to an appropriate AI capability, including recognising when simple rules or analytics beat machine learning.
How to study it. Get the terms exact first, because later domains build on them. Be able to say in one sentence each what training, inference, a model, and a dataset are. Then learn the lifecycle as a loop, not a line: models degrade as data shifts, so monitoring and retraining are part of the job. Drill use-case questions until you can spot the distractor that proposes machine learning for a problem a lookup table would solve.
Easy to confuse
AI versus machine learning versus deep learning. They nest: AI is the broad field, machine learning is the subset that learns patterns from data, and deep learning is the subset of machine learning that uses many-layered neural networks. The exam tests whether you put the broader term where a narrower one belongs.
Supervised versus unsupervised learning. Supervised learning trains on labelled examples to predict a known target; unsupervised learning finds structure in unlabelled data, such as clusters. If the scenario already has labelled outcomes it is supervised; if it is grouping or finding patterns with no labels it is unsupervised.
A problem for machine learning versus a problem for plain rules. Machine learning earns its keep when the pattern is complex and learned from data; a fixed lookup or a simple rule wins when the logic is known and deterministic. The exam plants a machine-learning distractor on a problem a lookup table would solve.
Worked example from the AIF-C01 bank
lock_openFree sampleFundamentals of AI and MLeasy
A retail company wants a system that improves its product recommendation accuracy over time by learning from customer purchase history, without being explicitly programmed with new rules each season. Which AI discipline best describes this approach?
ARule-based expert systems, which encode domain knowledge as hand-crafted if-then rules.
BMachine learning, which trains models on historical data so that predictive behaviour improves with experience.check_circle Correct
CBusiness intelligence reporting, which aggregates and visualises historical sales data for human analysts.
DRobotic process automation, which executes predefined workflow steps to mimic human repetitive tasks.
Understand that machine learning is the AI discipline in which models improve from data without explicit rule updates. Machine learning algorithms discover patterns in training data and adjust internal parameters so that predictions become more accurate as more data is seen. This contrasts with rule-based systems, which require a human to codify every new condition. The self-improving property from historical purchase data is the hallmark of ML.
Why A is wrong: Tempting because rule-based systems do drive recommendations, but they require explicit programming of every new rule. They do not learn autonomously from data, which is the defining requirement in the stem.
Why B is correct: Machine learning is the discipline in which algorithms learn patterns from data and improve performance on a task without being explicitly re-programmed, matching the scenario precisely.
Why C is wrong: Tempting because it uses historical purchase data, but BI reporting surfaces information for human decision-making rather than training a model that autonomously improves its own predictions.
Why D is wrong: Tempting because RPA is often grouped with AI tooling, but it follows a fixed script and does not learn from data. It cannot improve recommendation accuracy through experience.
What you must be able to do. Explain generative AI at a working level, predict how temperature and top-p change output, and judge when a probabilistic model is the wrong tool for a task that needs a guaranteed answer.
In one sentenceHow generative AI actually works: tokens, embeddings, prompts, and foundation models, the sampling knobs that trade determinism for variety, and the limits that decide when not to use it.
Recall check: answer these from memory first
Explain in one line each what a token, an embedding, and a foundation model are.
Say what temperature 0 does to the output and why an audit summary would want it.
Name three limitations of generative AI that would make it the wrong tool for a task needing a guaranteed-correct answer.
What it tests. How generative AI works at a useful level: tokens, embeddings, prompts, and foundation models, and how a large language model predicts the next token. It tests the practical effect of sampling parameters such as temperature and top-p, and a clear-eyed view of what generative AI is good at and where it fails, including hallucination, cost, latency, and inconsistency.
How to study it. Anchor the sampling concepts to outcomes you can reason about: low temperature gives reproducible, on-message output, which is what an audit summary needs; higher temperature widens variety. Learn the limitations as decision criteria, so that when a scenario needs a guaranteed-correct factual answer you can explain why a raw model is the wrong choice. This domain carries real weight, so do not skim it.
Easy to confuse
Temperature versus top-p. Both control randomness, but temperature scales how flat the probability distribution is, while top-p limits sampling to the smallest set of tokens whose probabilities sum to p. Temperature 0 forces greedy, reproducible output; the exam pairs the requirement (reproducible versus varied) with the setting.
Generative versus discriminative models. A generative model produces new content such as text or images; a discriminative model classifies or predicts a label for existing input. If the task is creating or drafting it is generative; if it is sorting or scoring it is discriminative.
Hallucination versus a wrong retrieval. A hallucination is the model inventing plausible content with no grounding; a retrieval error is a grounded system fetching the wrong document. The fix differs: grounding and evaluation reduce hallucination, better retrieval fixes the second, so the exam wants the cause named correctly.
Worked example from the AIF-C01 bank
lock_openFree sampleGenerative AI fundamentalsmedium
A support team sends the same prompt to a hosted LLM many times a day and gets noticeably different wording each time. They want answers to stay closely on-message and vary far less, without demanding byte-for-byte identical output. Which change best achieves this?
ALower the temperature toward 0.2 so the model concentrates on the highest-probability tokens.check_circle Correct
BRaise the temperature toward 1.0 so the model considers a wider spread of next tokens.
CIncrease the maximum output-token limit so each answer has room to finish completely.
DRaise top-p toward 1.0 so the model samples from the full probability distribution.
Lowering temperature concentrates output on the most-probable tokens, reducing run-to-run variation. Temperature scales how sharply the model favours its most-probable tokens. Lowering it toward 0 concentrates probability mass on the top tokens, so the same prompt produces closely similar, on-message wording each time without forcing fully identical output.
Why A is correct: Correct. A low temperature sharpens the distribution toward the likeliest tokens, so wording stays consistent run to run while still allowing slight variation.
Why B is wrong: Higher temperature flattens the token distribution and increases variation, which moves away from the on-message consistency the team asked for.
Why C is wrong: The token limit governs how long an answer can be, not how much its wording varies, so consistency is unchanged.
Why D is wrong: A top-p of 1.0 keeps the entire distribution in play, which preserves or increases variation rather than reducing it.
What you must be able to do. Design an application on a foundation model under real constraints, and make the retrieval-versus-fine-tuning decision automatically from the signals in the scenario.
In one sentenceThe largest domain: building on foundation models, the retrieval-versus-fine-tuning choice, the training and tuning process, and how to evaluate a model honestly.
Recall check: answer these from memory first
Given a knowledge source that changes weekly, say whether to use retrieval or fine-tuning and why in one line.
Name four design constraints you weigh when building on a foundation model.
Distinguish pre-training, fine-tuning, and instruction tuning in one line each.
What it tests. Designing applications on top of foundation models: model selection, context windows, cost and latency, and guardrails, plus the big architectural choice of retrieval-augmented generation versus fine-tuning. It also covers the training and fine-tuning process at a conceptual level and the methods used to evaluate model performance.
How to study it. This is the largest domain, so spend the most time here. Make the retrieval-versus-fine-tuning decision automatic: a source that changes often wants retrieval and a re-index, not a fresh fine-tuning run; fine-tuning bakes knowledge and behaviour into the weights. Learn evaluation as a theme that recurs across the exam: a single metric rarely tells the truth, and the evaluation has to reflect the real task.
Easy to confuse
Retrieval-augmented generation versus fine-tuning. Retrieval grounds the model in current documents at query time without changing the weights; fine-tuning bakes knowledge and behaviour into the weights through further training. A frequently changing source points to retrieval and a re-index, which is the exam's most common tell on this domain.
Context window versus fine-tuning. The context window is how much text the model can read in a single request; fine-tuning changes the model itself. Putting more reference text in the prompt uses the context window and needs no training, so the exam tests whether you reach for retraining when supplying context would do.
Pre-training versus fine-tuning. Pre-training builds a general model from a large unlabelled corpus at great cost; fine-tuning adapts that model to a narrower task with a much smaller dataset. The exam contrasts the scale and cost of the two and asks which step a given scenario actually needs.
Worked example from the AIF-C01 bank
lock_openFree sampleApplications of foundation modelsmedium
A developer is building a customer-support chatbot that must answer questions about a company's product catalogue, which is updated weekly. The catalogue contains roughly 200,000 tokens of text. Which design consideration most directly determines whether a single foundation model call can process the entire catalogue without retrieval augmentation?
AThe model's training data cutoff date, because any catalogue content newer than the cutoff will be ignored by the model during inference.
BThe model's output token limit, because generating long catalogue summaries requires a high output ceiling to avoid truncation.
CThe model's context window size, because the full catalogue must fit within the input limit for a single inference call.check_circle Correct
DThe model's temperature setting, because a lower temperature ensures the model attends to all catalogue entries rather than sampling selectively.
Understand that a model's context window size is the primary architectural constraint when deciding whether retrieval augmentation is needed. Foundation models have a fixed context window - the maximum number of tokens they can process in a single inference call, covering both input and output. When source material exceeds this window, the application must use retrieval-augmented generation (RAG) or chunking strategies to surface relevant content before the model call. Training cutoff, temperature, and output limits are real model properties but do not govern input capacity.
Why A is wrong: Training cutoff affects factual knowledge baked into weights, not the ability to process input tokens supplied at inference time. Providing the catalogue as context bypasses the cutoff problem entirely, so this is not the limiting factor here.
Why B is wrong: Output token limits affect response length, not input capacity. The question asks about processing the catalogue as input, so output limits are irrelevant to this constraint.
Why C is correct: Context window defines the maximum tokens a model can accept per call. If the catalogue exceeds that limit, the model cannot process it in one shot, making retrieval augmentation necessary.
Why D is wrong: Temperature controls output randomness and has no bearing on how many input tokens the model can accept. It is a tempting distractor because 'attending to all entries' sounds plausible, but attention and context length are separate concepts.
What you must be able to do. Match a prompting technique to the task it helps, and evaluate prompt quality against a rubric rather than trusting one happy example.
In one sentenceChoosing and improving prompts: zero-shot, few-shot, and chain-of-thought, and treating prompt quality as something you measure against a rubric, not eyeball once.
Recall check: answer these from memory first
Say when you would reach for few-shot prompting and when for chain-of-thought.
Explain why a prompt that works on one example still needs an evaluation set before you trust it.
Name the three levers in a prompt, beyond the question itself, that change the result.
What it tests. Choosing and improving prompts: zero-shot, few-shot, and chain-of-thought prompting, how instructions, examples, and output formats change results, and how to evaluate prompt quality against a rubric and iterate. The theme is that a prompt which works on one example can fail on the next, so evaluation sets matter.
How to study it. Match each technique to the kind of task it helps: few-shot for format and tone, chain-of-thought for multi-step reasoning. The smaller weight makes this a good domain to secure with focused practice rather than long study. Treat prompt evaluation as the same discipline as model evaluation: measure against a rubric, compare variants, do not trust a single happy example.
Easy to confuse
Zero-shot versus few-shot prompting. Zero-shot gives the instruction with no worked examples; few-shot includes a handful of input-output examples to fix the format and tone. If the scenario needs a specific output shape or style, the examples in few-shot are the lever the exam is pointing at.
Few-shot versus chain-of-thought prompting. Few-shot shows examples of the answer; chain-of-thought asks the model to reason through intermediate steps before answering. Use few-shot for format and tone, chain-of-thought for multi-step reasoning, which is the distinction the exam tests.
Worked example from the AIF-C01 bank
lock_openFree samplePrompt engineering and evaluationmedium
A developer is building a customer support chatbot that must classify incoming messages into one of five predefined categories. The model occasionally misclassifies edge cases. Which prompt engineering technique is most likely to improve classification accuracy without fine-tuning?
AIncrease the model's temperature setting to make responses more varied and exploratory.
BAdd a few labelled examples of each category directly in the prompt before the user message.check_circle Correct
CAppend a system message instructing the model to respond only in JSON format.
DBreak the prompt into multiple sequential calls so each call handles one category independently.
Understand how few-shot prompting anchors a model to a correct label space and improves classification accuracy on edge cases. Few-shot prompting supplies the model with labelled input-output pairs that demonstrate the desired mapping. This narrows the model's interpretation of the task and reduces the probability of it choosing an incorrect label, especially for inputs that sit near the boundary between categories.
Why A is wrong: Higher temperature increases randomness, which is likely to worsen classification consistency rather than improve it; it does not address the root cause of misclassification.
Why B is correct: Few-shot prompting provides the model with concrete examples of the task, anchoring its output distribution to the correct label space and reducing ambiguity on edge cases.
Why C is wrong: Enforcing JSON output controls format, not classification accuracy. Without category examples, the model still has the same ambiguity about which label to assign.
Why D is wrong: Running one call per category would require five API calls and still relies on the model making a binary yes/no decision per category, which does not resolve the original classification ambiguity.
What you must be able to do. Recognise what documents and discloses a model responsibly, and apply the AI-specific and general security controls a governed AI system needs.
In one sentenceBuilding AI responsibly and securely: bias and fairness, model cards, explainability, and the security controls including prompt-injection defence.
Recall check: answer these from memory first
List the four things a model card discloses about a model.
Name the AI-specific security threat the exam expects you to defend against, and how.
Explain why explainability matters for trust and compliance, not just for debugging.
What it tests. Building AI responsibly and securely: bias, fairness, and safety, the artefacts that document a model such as model cards, why transparency and explainability matter for trust and compliance, and the security and governance obligations on AI systems, including least-privilege access, data protection, and prompt-injection defence.
How to study it. Learn what a model card discloses and why, because it ties together intended use, limitations, training data, and evaluation. Keep the security items concrete: prompt injection is the AI-specific threat to be ready for, and least privilege and data protection are the general controls that still apply. This is the smallest domain but the questions are usually straightforward marks once the concepts are clear.
Easy to confuse
Bias versus variance. In responsible-AI terms the exam means bias as unfair or skewed outcomes against a group, not the bias-variance trade-off of model error. Read the context: a fairness scenario is about discrimination and representativeness, not about underfitting.
Interpretability versus post-hoc explainability. An interpretable model is transparent by design so you can read its logic directly; post-hoc explainability applies a separate method to explain an opaque model after the fact. The exam asks which approach a given trust or compliance need calls for.
Prompt injection versus a general security control. Prompt injection is the AI-specific attack where crafted input hijacks the model's instructions; least privilege and data protection are the general controls that apply to any system. The exam wants the AI-specific threat named as such, not folded into ordinary access control.
Worked example from the AIF-C01 bank
lock_openFree sampleResponsible and secure AIeasy
A data science team notices that a credit-scoring model approves loans at a significantly lower rate for one demographic group compared to all others, even though that group has similar repayment histories. Which practice most directly addresses this disparity during model development?
AEvaluating disaggregated fairness metrics across demographic subgroups and retraining with corrected data or constraintscheck_circle Correct
BIncreasing the model's overall accuracy on the full training dataset
CDeploying the model behind an API so that end users cannot inspect its decision logic
DRemoving all demographic attributes from the training features before the initial model fit
Understand how disaggregated fairness evaluation and targeted mitigation correct demographic disparity in AI model outputs. Responsible AI development requires measuring model behaviour separately for each subgroup rather than relying on aggregate metrics. Disaggregated fairness metrics (such as equalised odds or demographic parity difference) reveal where a model fails specific groups, enabling corrective actions such as data augmentation, re-weighting, or in-training fairness constraints that reduce the disparity directly.
Why A is correct: Disaggregated evaluation exposes differential performance per subgroup, allowing targeted mitigation such as re-sampling, re-weighting, or fairness constraints during retraining.
Why B is wrong: Tempting because higher accuracy sounds like a better model, but aggregate accuracy can improve while bias against a subgroup worsens if that group is a small fraction of the dataset.
Why C is wrong: Hiding decision logic via an API does nothing to correct the underlying bias; it only reduces transparency, which compounds responsible AI concerns.
Why D is wrong: Removing protected attributes is a common first step but does not eliminate bias because correlated proxy features (postcode, occupation) can still encode demographic information.
A study plan that works
Map the blueprint and set a date
Day 1
Read the official exam guide and the five domains with their weights. Book a provisional exam date now: a fixed date turns open-ended study into a plan and is the single biggest predictor of actually sitting the exam.
Lock the fundamentals (Domain 1)
Week 1
Get the core vocabulary and the ML lifecycle solid before anything else, because the generative AI material assumes it. Use the recall prompts in this guide: cover the summary, answer from memory, then reveal. Aim to explain each term out loud without notes.
Go deep on generative AI and foundation models (Domains 2 and 3)
Weeks 1-2
These two carry the most weight. Spend the bulk of your time here: sampling parameters, the retrieval-versus-fine-tuning decision, context windows, guardrails, and evaluation. Use scenario questions, not flashcards alone.
Secure prompting and responsible AI (Domains 4 and 5)
Weeks 2-3
Cover prompt techniques and the responsible and secure AI concepts. These are lower weight and largely conceptual, so a focused pass plus practice is enough.
Practise on scenarios with worked explanations
Week 4
Move to full practice sets and read the explanation for every question, including the ones you got right. The exam tests judgement between plausible options, so understanding why a distractor is wrong is where the marks are.
Find and close your weak domains
Week 4
Use your per-domain accuracy to drill the two domains that are dragging you down rather than re-reading what you already know. Repeat until every domain clears the pass line with margin.
Sit a timed mock and review it
Week 5
Take at least one full timed mock to rehearse pacing and flag-and-return. Treat the score as a per-domain readiness signal, then review every missed question before booking or sitting.
Know when you're ready
Readiness for the AIF-C01 is a score on questions you have not seen before, not a feeling that the material is familiar. Those are different things, and the gap between them is where people fail. Re-reading notes builds fluency, and fluency feels like knowledge, so confidence rises while real recall does not. The fix is to test yourself: if you can answer fresh scenario questions and explain why the wrong options are wrong, you know it; if you can only nod along to an explanation, you do not yet.
Be especially wary of early confidence on a foundational exam. The material is broad rather than deep, so a first pass feels like mastery, but the questions reward judgement between plausible options, which only practice exposes. Trust your measured per-domain accuracy over your gut, and set the bar at clearing every domain comfortably on unseen questions across more than one session, not scraping the pass mark once.
This guide gives you the map. The practice bank is where you find out whether you can navigate it, with a worked explanation and a reason every distractor is wrong on every question. Readiness scoring tells you when you are there. Not before.
Ready to put this into practice?
Free AIF-C01 questions with worked explanations. No sign-up.
Read the last line of the question first. It tells you what is actually being asked, so you can read the scenario looking for the answer rather than memorising detail.
Choose the most appropriate option, not merely a correct one. Several options are often true; the exam wants the best fit for the stated requirement.
Watch for absolutes such as always, never, and guarantees. In AI scenarios they are usually the wrong answer because models are probabilistic.
Flag and move on. Do not lose time on one hard item when easier marks are waiting; the timer rewards covering every question first.
When a source changes often, think retrieval and re-index before fine-tuning. This single distinction resolves a surprising number of Domain 3 questions.
Eliminate two options fast. Most questions have two clearly weaker choices; removing them turns a guess into a coin flip at worst.
Frequently asked questions
Is AIF-C01 hard?
It is a foundational exam, so it is broad rather than deep and involves no coding. The difficulty is in choosing the best option among plausible ones, which is why scenario practice with worked explanations matters more than memorising definitions.
How long should I study for AIF-C01?
Most candidates with some technical background are ready in two to four weeks of focused study. Less background means more time on the fundamentals and generative AI domains, which is where the weight sits.
Do I need AWS experience or coding to pass?
No. The exam is conceptual. You should understand AWS AI services at a high level and where they fit, but you are not asked to write code or operate the console.
What is the pass mark for AIF-C01?
The exam is scored on a scaled range and the published pass mark is in the facts panel above. Scoring is scaled, so your raw percentage and the scaled score are not the same thing; aim to clear every domain comfortably in practice rather than scraping a target.
Which domains should I focus on?
Generative AI fundamentals and applications of foundation models together make up the largest share of the exam, so they deserve the most time. The responsible AI and prompt engineering domains are smaller and largely conceptual.
Should I learn retrieval-augmented generation or fine-tuning?
Understand both and, more importantly, when to choose each. Retrieval grounds answers in current documents without retraining and suits frequently changing sources; fine-tuning bakes knowledge and behaviour into the model and suits stable, specialised needs.
How many practice questions should I do before booking?
Enough that every domain clears the pass line with margin on questions you have not seen before, and that a full timed mock feels comfortable on pacing. Quality of review matters more than raw volume: read the explanation on every question.
Is the AWS AI Practitioner worth it?
It is a practical credential for developers, analysts, and non-technical staff who work around AI and need a recognised grounding in AI, machine learning, and generative AI on AWS. The breadth of the content makes it a solid first rung before the more specialised AWS AI and ML associate credentials.
Examworthy is not affiliated with or endorsed by Amazon Web Services. This guide is original study material based on the public exam blueprint. We never reproduce live exam items. AIF-C01 and related marks belong to their respective owners.