Microsoft Fabric Data Engineer (DP-700) cheat sheet
Microsoft
Free to share. Examworthy is not affiliated with or endorsed by Microsoft; DP-700 and related marks belong to their respective owners.
At a glance
Format: Multiple choice and multiple response, at a Pearson VUE testing center or online proctored
Domain weight map
Heaviest first - spend your time hereHow this exam thinks
DP-700 rewards the answer that fits the Microsoft Fabric workload, the right store and engine for the read and write pattern, the most maintainable low-code-versus-code choice, and the incremental, idempotent load, not the most powerful or hand-rolled option.
Spot the trap
Tempting wrong answers, and why they failTempting but wrong
Raising the default node size on the Fabric starter pool will fix slow Spark session start-up.
Why it fails
Bigger nodes only give each session more memory and processing power; they do nothing about cold-start provisioning. Larger nodes can actually take longer to acquire, so startup latency is unchanged or worse.
Implement and Manage an Analytics Solution
Tempting but wrong
For an append-only source, truncating the bronze Delta table and reloading the full source each night is a low-overhead way to keep it in sync.
Why it fails
A full truncate-and-reload guarantees a match but reads the entire source every night, which scales poorly and ignores the append-only nature. It adds cost and run time, the very overhead a high-water-mark watermark is meant to avoid.
Ingest and Transform Data
Tempting but wrong
To check whether dozens of overnight jobs succeeded, you should open each pipeline and notebook's run history one by one.
Why it fails
Per-item run history only shows that single item's runs and forces manual, item-by-item checking. The Monitoring hub already consolidates runs across all item types into one filterable list with status and start time, so there is no need to open each item separately.
Monitor and Optimize an Analytics Solution
Tempting but wrong
Enabling high concurrency makes the first Spark session in a Fabric workspace start in seconds.
Why it fails
High concurrency lets several notebooks share an existing session, which helps only once a session is running. The very first session still pays full provisioning cost, so it alone does not deliver a few-second cold start.
Implement and Manage an Analytics Solution
Tempting but wrong
Enabling change data capture on an insert-only source is the right way to load only new rows incrementally.
Why it fails
Change data capture is built for sources whose rows are also updated and deleted. An insert-only source never produces those change types, so capturing updates and deletes just adds source-side configuration and overhead with no benefit. A high-water-mark watermark on the increasing column is the lighter, correct fit.
Ingest and Transform Data
Tempting but wrong
The Microsoft Fabric Capacity Metrics app gives a consolidated success or failure list of overnight scheduled runs.
Why it fails
The Capacity Metrics app reports compute consumption and throttling against a capacity, not a per-run success or failure list. To see which scheduled items failed overnight, use the Monitoring hub, which lists runs with their status.
Monitor and Optimize an Analytics Solution
Tempting but wrong
A custom Spark pool with autoscale off and a fixed minimum node count gives fast starts without reserving compute.
Why it fails
A cold custom pool still provisions on demand, and a permanent node floor reserves compute continuously. That contradicts the goal of avoiding permanently reserved capacity, unlike the pre-warmed starter pool.
Implement and Manage an Analytics Solution
Tempting but wrong
Copying the full source each run into staging and MERGEing on the business key keeps an append-only load cheap by skipping unchanged rows.
Why it fails
A MERGE on the key avoids duplicates, but it still reads the entire source every run. That expensive full read is exactly what a watermark removes, so this design keeps the cost the low-overhead requirement is trying to cut.
Ingest and Transform Data
Key terms
Exam-day rules
- Read the scenario for the read and write pattern and the stated constraint first, then match it to the Microsoft Fabric workload. Distractors are written to sound reasonable; the right answer is the store, engine, and load pattern that fit, not the most powerful tool on offer.
- When a question stresses low operational overhead or incremental only, choose the incremental and idempotent pattern. A high-water-mark watermark beats a truncate-and-reload for an append-only source with a monotonic column.
- When a question stresses maintainability, choose the lowest-code option that still meets the requirement: a Dataflows Gen2 transform over a hand-written notebook when the logic is simple, and a Data Factory pipeline to sequence and parameterise the rest.
- Match the security control to what must be hidden: object-level security to remove a table from a role, row-level security to filter rows, column-level security or dynamic data masking to obscure values. They are not interchangeable.
- Remember the store boundaries: the Lakehouse SQL analytics endpoint reads but does not write, so route UPDATE and DELETE through Spark or a Warehouse, and reach for an Eventhouse and a KQL database for high-volume streaming telemetry.
Revision schedule
- Day 1Map the blueprint and book a date
- Week 1Build the Fabric foundations hands-on
- Weeks 2 to 3Go deep on ingest and transform
- Week 4Master security and orchestration
- Week 5Drill monitoring and optimisation