A nightly pipeline ingests an append-only sales event table from an on-premises SQL Server source into a bronze Delta table in a Microsoft Fabric Lakehouse. Each event row carries an immutable EventId and an ever-increasing CreatedUtc timestamp, and rows are never updated or deleted at source. The team wants each run to copy only rows added since the previous run while keeping operational overhead low. Which design best meets this requirement?
- AStore the highest CreatedUtc loaded so far as a high-water mark, then on each run copy only source rows whose CreatedUtc exceeds that stored value and update the mark. Correct
- BTruncate the bronze Delta table at the start of every run and reload the entire source table so the destination always matches the source exactly.
- CEnable change data capture on the source table and stream the captured insert, update, and delete records into the bronze Delta table on each run.
- DCopy the full source each run into a staging table, then MERGE staging into bronze on EventId so unchanged rows are skipped during the upsert step.
Why A is correct: An append-only source with a monotonically increasing timestamp is the textbook case for a high-water-mark watermark; querying rows above the stored mark loads only new rows with minimal overhead and no source-side change tracking.
Why B is wrong: A full truncate-and-reload guarantees a match but reads the whole source nightly, which scales poorly and ignores the append-only nature; it adds cost and time the stated low-overhead requirement is trying to avoid.
Why C is wrong: Change data capture is built for sources that change rows; this source is insert-only, so capturing updates and deletes adds source-side configuration and overhead for change types that never occur here.
Why D is wrong: A MERGE on EventId would avoid duplicates, but it still reads the entire source every run; the expensive full read is exactly what a watermark removes, so this keeps the cost the requirement wants to cut.