PMLE practice questions Full study guide

Examworthyexamworthy.com

Google Cloud Professional Machine Learning Engineer cheat sheet

Google Cloud

Exam version 2026Reviewed 2026-06-08

Free to share. Examworthy is not affiliated with or endorsed by Google Cloud; PMLE and related marks belong to their respective owners.

At a glance

50 to 60

Questions

120 min

Time allowed

$200

Cost (USD)

Format: Multiple choice and multiple select, online- or onsite-proctored

Domain weight map

Heaviest first - spend your time here

Scaling Prototypes Into ML Models21% · 47 Q

Serving and Scaling Models20% · 49 Q

Automating and Orchestrating ML Pipelines18% · 42 Q

Collaborating Within and Across Teams to Manage Data and Models16% · 37 Q

Monitoring AI Solutions13% · 29 Q

Architecting Low-Code AI Solutions12% · 26 Q

How this exam thinks

PMLE is a pick-the-right-approach exam across the whole ML lifecycle: nearly every question is a scenario with cost, latency, skill, and scale constraints, and the right answer is the Google Cloud service or setting that fits them, usually the most managed option that meets the requirement.

Spot the trap

Tempting wrong answers, and why they fail

Tempting but wrong

For nightly univariate forecasts, why not train one deep neural network across all series and serve batch predictions via a managed endpoint?

Why it fails

A DNN adds Python and serving overhead a SQL-only team cannot maintain, and a single shared network ignores the per-series univariate structure that ARIMA captures directly. BigQuery ML ARIMA keeps the work in SQL and the results in the warehouse instead.

Scaling Prototypes Into ML Models

Tempting but wrong

To remove cold-start timeouts on a spiky endpoint, just set minReplicaCount equal to peak so replicas are always provisioned.

Why it fails

Pinning the floor to peak does eliminate cold starts, but it keeps every peak replica running through idle periods, so you pay full peak cost overnight. A small warm floor plus a high ceiling and an earlier scale-out trigger absorbs the spike without holding peak capacity when traffic is near zero.

Serving and Scaling Models

Tempting but wrong

Retraining every night guarantees the model is always fresh, so nightly scheduling is the safest policy.

Why it fails

Nightly scheduling sounds safer because data is fresh, but it retrains even when performance is stable, wasting compute on needless runs. When degradation is unpredictable, a performance or drift threshold fires only on genuine decay, which conserves compute without sacrificing freshness.

Automating and Orchestrating ML Pipelines

Tempting but wrong

For a one-off terabyte-scale Parquet backfill, I should provision a Dataproc Spark cluster and write the joins in Spark SQL.

Why it fails

Spark on Dataproc can express the same logic, but it forces you to size, run, and tear down a cluster. That contradicts a stated wish to avoid maintaining infrastructure for a one-off job. A serverless SQL engine like BigQuery handles the same scale with no cluster lifecycle to manage.

Collaborating Within and Across Teams to Manage Data and Models

Tempting but wrong

Setting model temperature to zero makes an agent deterministic, so it will stop following injected instructions in user input.

Why it fails

Temperature only controls sampling randomness, not whether the model obeys injected instructions. A fully deterministic model will still comply with a successful jailbreak. Defending against prompt injection needs a screening layer, not a sampling change.

Monitoring AI Solutions

Tempting but wrong

I can lower the cost of repeated Gemini calls by raising the temperature so the model commits sooner and retries less.

Why it fails

Temperature only changes how random the sampling is. It does not reduce the tokens billed or shorten the prompt, so it leaves both input-token cost and latency untouched. Use context caching to bill a repeated prefix at a reduced rate instead.

Architecting Low-Code AI Solutions

Tempting but wrong

Can sending each series' history to a large language model with a forecasting prompt produce reliable numeric forecasts at scale?

Why it fails

No. An LLM is costly per call at thousands of series, gives no statistical guarantees for numeric forecasting, and is the wrong tool for structured univariate time series. ARIMA in BigQuery ML is purpose-built for this and far cheaper.

Scaling Prototypes Into ML Models

Tempting but wrong

I can avoid scaling lag by disabling autoscaling and running one large machine type to handle all traffic.

Why it fails

One oversized replica avoids scaling lag but cannot scale out for a sharp concurrent spike, so it becomes a throughput bottleneck during the burst. A single instance has a fixed concurrency ceiling; absorbing a spike needs horizontal scale-out with a warm floor, not a bigger box.

Serving and Scaling Models

Key terms

ARIMA, DNN, and LLMAgent Platform AutoMLDeployment strategyInterpretabilityAgent Platform custom trainingKubeflow on GKEHyperparameter tuningFine-tuning foundation modelsCPU, GPU, and TPUDistributed trainingData and model parallelismOnline and batch inferenceCustom containersModel RegistryCanary deploymentsAgent Platform Feature Store

Exam-day rules

Read the scenario for its constraint first. The cost, latency, scale, team-skill, or operational-overhead limit named in the question is what picks the answer, so find it before you judge the options.
When two approaches both work, default to the most managed one. Google prefers managed services, so BigQuery ML for a SQL team, AutoML over a hand-built network, a tuned foundation model over training from scratch; reach lower only when the scenario names a reason.
Treat a SQL-fluent team whose data is already in BigQuery as a strong signal. It usually points to BigQuery ML, including fine-tuning a Gemini remote model and serving with ML.GENERATE_TEXT, over an exported custom training job.
Decide online versus batch from the latency requirement. A periodic bulk scoring job with no per-row latency need is a distributed batch prediction job; low-latency single requests need an online endpoint, never the reverse.
On the parallelism question, ask whether the model fits on one device. If it does not fit even at batch size one, it is model parallelism; if it fits but training is slow, it is data parallelism.

Revision schedule

Day 1
Map the blueprint and book a date
Week 1
Build the lifecycle decision map
Weeks 1 to 3
Go deep on scaling and serving (Domains 3 and 4)
Weeks 3 to 4
Lock pipelines and collaboration (Domains 5 and 2)
Week 4
Cover low-code and monitoring (Domains 1 and 6)

Practise PMLE free

Every question has a worked explanation and a per-distractor rationale. No sign-up.

920 audited flashcards in this deck.

Practise PMLE free

Examworthy - Google Cloud Professional Machine Learning Engineer (PMLE) cheat sheet. Free to share.examworthy.com