Google Cloud Professional Machine Learning Engineer cheat sheet
Google Cloud
Free to share. Examworthy is not affiliated with or endorsed by Google Cloud; PMLE and related marks belong to their respective owners.
At a glance
Format: Multiple choice and multiple select, online- or onsite-proctored
Domain weight map
Heaviest first - spend your time hereHow this exam thinks
PMLE is a pick-the-right-approach exam across the whole ML lifecycle: nearly every question is a scenario with cost, latency, skill, and scale constraints, and the right answer is the Google Cloud service or setting that fits them, usually the most managed option that meets the requirement.
Spot the trap
Tempting wrong answers, and why they failTempting but wrong
For nightly univariate forecasts, why not train one deep neural network across all series and serve batch predictions via a managed endpoint?
Why it fails
A DNN adds Python and serving overhead a SQL-only team cannot maintain, and a single shared network ignores the per-series univariate structure that ARIMA captures directly. BigQuery ML ARIMA keeps the work in SQL and the results in the warehouse instead.
Scaling Prototypes Into ML Models
Tempting but wrong
To remove cold-start timeouts on a spiky endpoint, just set minReplicaCount equal to peak so replicas are always provisioned.
Why it fails
Pinning the floor to peak does eliminate cold starts, but it keeps every peak replica running through idle periods, so you pay full peak cost overnight. A small warm floor plus a high ceiling and an earlier scale-out trigger absorbs the spike without holding peak capacity when traffic is near zero.
Serving and Scaling Models
Tempting but wrong
Retraining every night guarantees the model is always fresh, so nightly scheduling is the safest policy.
Why it fails
Nightly scheduling sounds safer because data is fresh, but it retrains even when performance is stable, wasting compute on needless runs. When degradation is unpredictable, a performance or drift threshold fires only on genuine decay, which conserves compute without sacrificing freshness.
Automating and Orchestrating ML Pipelines
Tempting but wrong
For a one-off terabyte-scale Parquet backfill, I should provision a Dataproc Spark cluster and write the joins in Spark SQL.
Why it fails
Spark on Dataproc can express the same logic, but it forces you to size, run, and tear down a cluster. That contradicts a stated wish to avoid maintaining infrastructure for a one-off job. A serverless SQL engine like BigQuery handles the same scale with no cluster lifecycle to manage.
Collaborating Within and Across Teams to Manage Data and Models
Tempting but wrong
Setting model temperature to zero makes an agent deterministic, so it will stop following injected instructions in user input.
Why it fails
Temperature only controls sampling randomness, not whether the model obeys injected instructions. A fully deterministic model will still comply with a successful jailbreak. Defending against prompt injection needs a screening layer, not a sampling change.
Monitoring AI Solutions
Tempting but wrong
I can lower the cost of repeated Gemini calls by raising the temperature so the model commits sooner and retries less.
Why it fails
Temperature only changes how random the sampling is. It does not reduce the tokens billed or shorten the prompt, so it leaves both input-token cost and latency untouched. Use context caching to bill a repeated prefix at a reduced rate instead.
Architecting Low-Code AI Solutions
Tempting but wrong
Can sending each series' history to a large language model with a forecasting prompt produce reliable numeric forecasts at scale?
Why it fails
No. An LLM is costly per call at thousands of series, gives no statistical guarantees for numeric forecasting, and is the wrong tool for structured univariate time series. ARIMA in BigQuery ML is purpose-built for this and far cheaper.
Scaling Prototypes Into ML Models
Tempting but wrong
I can avoid scaling lag by disabling autoscaling and running one large machine type to handle all traffic.
Why it fails
One oversized replica avoids scaling lag but cannot scale out for a sharp concurrent spike, so it becomes a throughput bottleneck during the burst. A single instance has a fixed concurrency ceiling; absorbing a spike needs horizontal scale-out with a warm floor, not a bigger box.
Serving and Scaling Models
Key terms
Exam-day rules
- Read the scenario for its constraint first. The cost, latency, scale, team-skill, or operational-overhead limit named in the question is what picks the answer, so find it before you judge the options.
- When two approaches both work, default to the most managed one. Google prefers managed services, so BigQuery ML for a SQL team, AutoML over a hand-built network, a tuned foundation model over training from scratch; reach lower only when the scenario names a reason.
- Treat a SQL-fluent team whose data is already in BigQuery as a strong signal. It usually points to BigQuery ML, including fine-tuning a Gemini remote model and serving with ML.GENERATE_TEXT, over an exported custom training job.
- Decide online versus batch from the latency requirement. A periodic bulk scoring job with no per-row latency need is a distributed batch prediction job; low-latency single requests need an online endpoint, never the reverse.
- On the parallelism question, ask whether the model fits on one device. If it does not fit even at batch size one, it is model parallelism; if it fits but training is slow, it is data parallelism.
Revision schedule
- Day 1Map the blueprint and book a date
- Week 1Build the lifecycle decision map
- Weeks 1 to 3Go deep on scaling and serving (Domains 3 and 4)
- Weeks 3 to 4Lock pipelines and collaboration (Domains 5 and 2)
- Week 4Cover low-code and monitoring (Domains 1 and 6)