Objectives in this domain

Design and implement full and incremental data loads, including watermarking and change detection.
Section 2.1hard
Prepare data for loading into a dimensional model, including slowly changing dimensions and surrogate keys.
Section 2.2hard
Design and implement a loading pattern for streaming data into Microsoft Fabric.
Section 2.3medium
Choose an appropriate data store for a workload, including Lakehouse, Warehouse, and Eventhouse.
Section 2.4hard
Choose between Dataflows Gen2, notebooks, KQL, and T-SQL for transforming batch data.
Section 2.5hard
Create and manage OneLake shortcuts, and implement mirroring of external databases.
Section 2.6medium
Ingest data using pipelines and transform it using PySpark, SQL, and KQL, including denormalising, grouping, and aggregating.
Section 2.7hard
Handle duplicate, missing, and late-arriving data during ingestion and transformation.
Section 2.8hard
Choose a streaming engine, and choose between native tables, OneLake shortcuts, and query acceleration in Real-Time Intelligence.
Section 2.9hard
Process streaming data using Eventstreams, Spark structured streaming, and KQL, including windowing functions.
Section 2.10hard

Sample question from this domain

Free sampleIngest and Transform Datahard

A nightly pipeline ingests an append-only sales event table from an on-premises SQL Server source into a bronze Delta table in a Microsoft Fabric Lakehouse. Each event row carries an immutable EventId and an ever-increasing CreatedUtc timestamp, and rows are never updated or deleted at source. The team wants each run to copy only rows added since the previous run while keeping operational overhead low. Which design best meets this requirement?

AStore the highest CreatedUtc loaded so far as a high-water mark, then on each run copy only source rows whose CreatedUtc exceeds that stored value and update the mark. Correct
BTruncate the bronze Delta table at the start of every run and reload the entire source table so the destination always matches the source exactly.
CEnable change data capture on the source table and stream the captured insert, update, and delete records into the bronze Delta table on each run.
DCopy the full source each run into a staging table, then MERGE staging into bronze on EventId so unchanged rows are skipped during the upsert step.

For an append-only source with a monotonically increasing column, use a high-water-mark watermark to load only rows added since the last run. Because rows are only ever inserted and CreatedUtc always increases, the maximum value processed in the previous run uniquely separates old rows from new ones; filtering the source on CreatedUtc greater than the stored mark loads exactly the new rows without reading the whole table or configuring source change tracking.

Why A is correct: An append-only source with a monotonically increasing timestamp is the textbook case for a high-water-mark watermark; querying rows above the stored mark loads only new rows with minimal overhead and no source-side change tracking.

Why B is wrong: A full truncate-and-reload guarantees a match but reads the whole source nightly, which scales poorly and ignores the append-only nature; it adds cost and time the stated low-overhead requirement is trying to avoid.

Why C is wrong: Change data capture is built for sources that change rows; this source is insert-only, so capturing updates and deletes adds source-side configuration and overhead for change types that never occur here.

Why D is wrong: A MERGE on EventId would avoid duplicates, but it still reads the entire source every run; the expensive full read is exactly what a watermark removes, so this keeps the cost the requirement wants to cut.

Other domains in this exam

Implement and Manage an Analytics Solution34% of the exam
Monitor and Optimize an Analytics Solution33% of the exam

See also the DP-700 cert hub, the study guide, and the cheat sheet.