DP-700 domain - 33% of the exam

Ingest and Transform Data

Ingest and Transform Data is 33% of the Microsoft Fabric Data Engineer (DP-700) (DP-700) exam. These are the objectives it covers, each with practice questions and worked explanations.

Objectives in this domain

Sample question from this domain

Free sampleIngest and Transform Datahard

A nightly pipeline ingests an append-only sales event table from an on-premises SQL Server source into a bronze Delta table in a Microsoft Fabric Lakehouse. Each event row carries an immutable EventId and an ever-increasing CreatedUtc timestamp, and rows are never updated or deleted at source. The team wants each run to copy only rows added since the previous run while keeping operational overhead low. Which design best meets this requirement?

  • AStore the highest CreatedUtc loaded so far as a high-water mark, then on each run copy only source rows whose CreatedUtc exceeds that stored value and update the mark. Correct
  • BTruncate the bronze Delta table at the start of every run and reload the entire source table so the destination always matches the source exactly.
  • CEnable change data capture on the source table and stream the captured insert, update, and delete records into the bronze Delta table on each run.
  • DCopy the full source each run into a staging table, then MERGE staging into bronze on EventId so unchanged rows are skipped during the upsert step.
For an append-only source with a monotonically increasing column, use a high-water-mark watermark to load only rows added since the last run. Because rows are only ever inserted and CreatedUtc always increases, the maximum value processed in the previous run uniquely separates old rows from new ones; filtering the source on CreatedUtc greater than the stored mark loads exactly the new rows without reading the whole table or configuring source change tracking.

Why A is correct: An append-only source with a monotonically increasing timestamp is the textbook case for a high-water-mark watermark; querying rows above the stored mark loads only new rows with minimal overhead and no source-side change tracking.

Why B is wrong: A full truncate-and-reload guarantees a match but reads the whole source nightly, which scales poorly and ignores the append-only nature; it adds cost and time the stated low-overhead requirement is trying to avoid.

Why C is wrong: Change data capture is built for sources that change rows; this source is insert-only, so capturing updates and deletes adds source-side configuration and overhead for change types that never occur here.

Why D is wrong: A MERGE on EventId would avoid duplicates, but it still reads the entire source every run; the expensive full read is exactly what a watermark removes, so this keeps the cost the requirement wants to cut.

Other domains in this exam

See also the DP-700 cert hub, the study guide, and the cheat sheet.

Examworthy is not affiliated with or endorsed by Microsoft. Original, blueprint-aligned practice material only.