PDE domain - 20% of the exam

Storing Data

Storing Data is 20% of the Google Cloud Professional Data Engineer (PDE) exam. These are the objectives it covers, each with practice questions and worked explanations.

Objectives in this domain

Sample question from this domain

Free sampleStoring Datahard

An architect is comparing BigQuery and Bigtable for a workload that records device telemetry from two million industrial sensors. Each sensor emits a reading every second, and downstream applications need to retrieve the most recent 24 hours of readings for any single sensor within tens of milliseconds, while a separate weekly analytical job scans aggregates across the full fleet. The architect wants to understand the fundamental role boundary between the two services. Which statement most accurately describes how BigQuery and Bigtable differ for this workload?

  • ABigtable is a wide-column NoSQL store with sorted row keys that gives single-digit millisecond reads for a known key, while BigQuery is a columnar analytical warehouse designed for high-throughput scans across very large tables; the per-sensor lookup belongs in Bigtable and the weekly aggregate belongs in BigQuery. Correct
  • BBigQuery is a wide-column NoSQL store optimised for single-row lookups by key, while Bigtable is a columnar analytical warehouse tuned for ad hoc SQL scans, so the per-sensor lookups should target BigQuery and the weekly aggregate should target Bigtable.
  • CBigQuery and Bigtable both target operational workloads, but BigQuery is preferred whenever rows exceed one kilobyte and Bigtable is preferred whenever rows are smaller, regardless of access pattern.
  • DBigtable and BigQuery are interchangeable for telemetry because both are columnar; the team should pick the cheaper one for the region and accept identical latency characteristics from each service.
Distinguish Bigtable as a low-latency wide-column NoSQL store from BigQuery as a columnar analytical warehouse when serving telemetry. Bigtable stores rows sorted by a single row key and is engineered for low-latency point and small range reads at very high write rates, which is exactly the per-sensor recent-history pattern. BigQuery stores data in columnar format across distributed storage and uses a slot-based execution engine that excels at scanning and aggregating across large tables, which is the weekly fleet-wide pattern. Choosing each service for the access pattern it was built for is the canonical PDE role boundary.

Why A is correct: Bigtable is sorted by row key and serves point and small range reads in low single-digit milliseconds, which suits the per-sensor 24-hour lookup, while BigQuery's columnar storage and slot-based execution are designed to scan and aggregate across very large tables on schedule, which suits the weekly cross-fleet job.

Why B is wrong: This reverses the actual roles. BigQuery is the columnar analytical warehouse and Bigtable is the wide-column key-ordered NoSQL store, so the description swaps the two services. A candidate who only half-remembers the column orientation of BigQuery can fall into this trap.

Why C is wrong: BigQuery is an analytical warehouse, not an operational store, and the selection between Bigtable and BigQuery is driven by access pattern rather than row size. The size-based rule sounds concrete but is fabricated and will mislead a candidate who has not internalised the role boundary.

Why D is wrong: Although both services use a column-oriented physical layout, their access patterns and latency profiles are very different. Bigtable serves low-latency keyed reads while BigQuery serves throughput-oriented scans, so they are not interchangeable for a real-time per-sensor lookup.

Other domains in this exam

See also the PDE cert hub, the study guide, and the cheat sheet.

Examworthy is not affiliated with or endorsed by Google Cloud. Original, blueprint-aligned practice material only.