PDE domain - 15% of the exam

Preparing and Using Data for Analysis

Preparing and Using Data for Analysis is 15% of the Google Cloud Professional Data Engineer (PDE) exam. These are the objectives it covers, each with practice questions and worked explanations.

Objectives in this domain

Sample question from this domain

Free samplePreparing and Using Data for Analysismedium

An analytics team has enabled a BigQuery BI Engine reservation in the same region as a Looker dashboard that queries a 40 GB fact table with several aggregations per panel. The team wants to understand exactly which queries the reservation will accelerate so they can size it correctly. Which statement best describes how BI Engine acceleration is applied to incoming queries?

  • ABI Engine accelerates only queries issued through Looker Studio and ignores queries submitted by other clients such as the BigQuery console or the bq command-line tool.
  • BBI Engine accelerates all queries that touch any table referenced by the dashboard regardless of region, because reservations are global resources shared across BigQuery locations.
  • CBI Engine accelerates eligible SQL queries against tables in the reserved project and region by serving them from an in-memory cache, falling back to standard BigQuery slots for unsupported features. Correct
  • DBI Engine accelerates queries by precomputing aggregations into a materialised view that is automatically registered in the reservation and refreshed on every base table change.
Recognise that BI Engine is a regional in-memory acceleration layer that transparently serves eligible BigQuery SQL and falls back to slots otherwise. BI Engine reservations are scoped to a project and region. When a query runs, BigQuery checks whether the referenced data fits the reservation and whether the query uses BI Engine supported SQL features. Eligible work is served from the in-memory cache, while unsupported operators or excess data fall back to standard slot execution. This client-agnostic, partial-acceleration behaviour is central to sizing decisions.

Why A is wrong: It is tempting because BI Engine was originally promoted as a Looker Studio accelerator. In practice acceleration is client-agnostic and applies to any SQL query that fits within the reservation's supported feature set, including queries from the console, bq, drivers, and Looker.

Why B is wrong: Region matching trips up many candidates. BI Engine reservations are regional, and a reservation only accelerates queries that run in the same location as the reserved data. Cross-region queries cannot be served from the cache.

Why C is correct: This is correct. BI Engine maintains an in-memory representation of frequently accessed data and rewrites supported query patterns to read from that cache. Queries or query fragments that use unsupported SQL features run on standard slots, so partial acceleration is possible.

Why D is wrong: This blurs BI Engine with materialised views. BI Engine is an in-memory caching layer, not a precomputation engine, and it does not create or own materialised views on the user's behalf.

Other domains in this exam

See also the PDE cert hub, the study guide, and the cheat sheet.

Examworthy is not affiliated with or endorsed by Google Cloud. Original, blueprint-aligned practice material only.