Microsoft study guide

How to pass Microsoft Fabric Data Engineer (DP-700)

20 min read3 domains coveredFree practice, no sign-up

The Microsoft Fabric Data Engineer (DP-700) tests whether you can build, secure, and run a working analytics solution end to end on Microsoft Fabric. The work splits into three jobs: ingesting and transforming data into a curated shape, securing and managing the workspace and its items, and monitoring and optimising what you have built so it stays fast and cheap. The exam is scenario-led. It hands you a read or write pattern, a constraint such as low operational overhead or incremental only, and asks for the design that fits the Fabric workload, not the most powerful tool you could reach for.

It suits practising data engineers, analytics engineers, and ETL or ELT developers who already move data for a living and now work in Fabric. The questions assume you have actually built a Lakehouse and a Warehouse, written PySpark and T-SQL and KQL, orchestrated with a Data Factory pipeline and notebooks, and configured OneLake security. You do not need to be an architect, but you do need real SQL and data-engineering experience, because the distractors are written for people who half know the platform.

What makes DP-700 pass-or-fail is judgement on the right store and engine for a given pattern, plus precision on a handful of documented behaviours. Many items hinge on one exact fact: that the Lakehouse SQL analytics endpoint is read-only so row-level corrections must come from Spark or a Warehouse, that object-level security hides a whole table from a role where row-level security and masking do not, that a high-water-mark watermark loads an append-only source with the least overhead, or that the Monitoring hub lists runs across pipelines, Dataflows Gen2, and notebooks in one view. Knowing roughly how Fabric works is not enough; the exam rewards knowing the rule the documentation actually states and matching it to the workload in front of you.

DP-700 rewards the answer that fits the Microsoft Fabric workload, the right store and engine for the read and write pattern, the most maintainable low-code-versus-code choice, and the incremental, idempotent load, not the most powerful or hand-rolled option.

Difficulty

Intermediate

Best for

Working data engineers, analytics engineers, and ETL or ELT developers who already build pipelines and transformations and now do that work in Microsoft Fabric, and want an associate credential proving they can ingest, secure, and optimise an analytics solution the way Microsoft documents it.

Prerequisites

None enforced, but DP-700 expects genuine data-engineering and SQL experience. In practice you want real hands-on time building a Lakehouse and a Warehouse, writing PySpark, T-SQL, and KQL, orchestrating with a Data Factory pipeline and notebooks, configuring OneLake security and shortcuts, and designing incremental loads. Comfort with the medallion architecture and dimensional modelling carries much of the ingest domain.

Typically 40 to 60 questions
Questions
100 min
Time allowed
700 / 1000
Pass mark
$165
Exam cost (USD)
256
Practice questions

How this exam thinks

One habit decides DP-700: read the scenario for the read and write pattern and the stated constraint, then pick the option that fits the Microsoft Fabric workload, not the option that is merely the most powerful or the one you would hand-roll. The distractors are plausible. They reach for full reloads when an incremental watermark is asked for, they pick a write path against a read-only endpoint, or they apply a control at the wrong layer. The right answer is the store, engine, and load pattern Microsoft recommends for that exact pattern.

The exam frames most questions as a small engineering task with a constraint: load only the rows added since the last run with low overhead, hide a sensitive table from a role completely, secure two independent access paths to the same data, or see overnight failures across every item type in one place. Each has a single best-fit answer. When a question stresses low operational overhead, choose the incremental and idempotent pattern over the brute-force reload. When it stresses maintainability, choose the lower-code option that still meets the requirement, a Dataflows Gen2 mapping over a hand-written notebook when the transform is simple. When it stresses the wrong engine failing, recognise the store boundary, the Lakehouse SQL analytics endpoint reads but does not write.

The rest is a set of discriminations the exam leans on, each resolved by one detail. Lakehouse versus Warehouse turns on whether you need read-write T-SQL or read-only over Delta. Object-level versus row-level security versus masking turns on whether you hide structure, filter rows, or obscure values. OneLake shortcut versus copy turns on whether the data stays in place or is duplicated. A high-water mark versus a full reload turns on append-only with a monotonic column. Name what the scenario asks for, then choose the option whose documented behaviour fits the workload exactly.

What each domain tests and how to study it

The DP-700 blueprint is split across 3 domains. Weights are the official share of the exam; see the official exam guide for the authoritative breakdown.

  1. Implement and Manage an Analytics Solution

    34% of exam

    What you must be able to do. Configure workspace and Spark settings correctly, apply the right access control at the right layer for each requirement, and orchestrate a process with the correct mix of Dataflow Gen2, pipeline, and notebook, passing parameters at run time.

    In one sentenceThe govern-and-orchestrate core: the right security control at the right layer, and the right low-code-versus-code orchestration with parameters flowing through at run time.

    Recall check: answer these from memory first
    • A role must not see a Salaries table at all, not even in the field list. Which security control achieves that, and why do row-level security and masking not?
    • A pipeline holds a table name in a parameter and must pass it to a notebook at run time without editing code. What two things wire that value into the notebook?
    • Small notebook jobs wait minutes for a Spark session to start. Which workspace Spark setting fixes the cold start without permanently reserving compute?
    • Analysts read the same Lakehouse through direct Spark file reads and the SQL analytics endpoint. Why must each path be secured separately?

    What it tests. The heaviest domain by weight: standing up and governing the solution. Configuring workspace settings for Spark, domains, OneLake, and Dataflows Gen2; lifecycle management with version control, database projects, and deployment pipelines; and the full security surface, workspace-level and item-level roles, row-level, column-level, object-level, and folder or file-level access, dynamic data masking, OneLake security, sensitivity labels, item endorsement, and audit logs. It also covers orchestration: choosing between a Dataflow Gen2, a Data Factory pipeline, and a notebook, building schedules and event triggers, and wiring parameters and dynamic expressions through notebooks and pipelines.

    How to study it. This is the largest slice, so it earns the most time, and it splits into security and orchestration. For security, build a mental matrix of the control layers and what each one actually hides: object-level security removes a table or column from a role's metadata, row-level security filters rows but leaves the object visible, column-level security and dynamic data masking change values not structure, and OneLake data access roles secure direct file reads independently of the SQL analytics endpoint. Practise securing the same data across both access paths, because the exam tests that folder roles and endpoint permissions do not inherit each other. For orchestration, drill the choice between a Dataflow Gen2 for low-code transforms, a pipeline for control flow and movement, and a notebook for code-heavy logic, then practise passing a pipeline parameter into a notebook through a tagged parameters cell and the Notebook activity base parameters. Configure the Spark starter pool and learn why pre-warmed nodes cut cold-start latency.

    Easy to confuse

    • Object-level security versus row-level security versus dynamic data masking. Object-level security removes a table or column from a role's metadata so it never appears in the field list; row-level security keeps the object present and filters which rows the role sees; dynamic data masking keeps both object and rows and only obscures the displayed values. To make a table truly invisible to a role you need object-level security, not the other two.
    • Dataflows Gen2 versus a Data Factory pipeline versus a notebook for orchestration. A Dataflow Gen2 is the low-code path for shaping and loading data with Power Query; a Data Factory pipeline orchestrates control flow, movement, and other activities including running notebooks; a notebook is the code-first path for Spark logic. Pick the lowest-code option that meets the requirement, and use a pipeline to sequence and parameterise the rest.

    Worked example from the DP-700 bank

    Free sampleImplement and Manage an Analytics Solutionhard

    A Microsoft Fabric semantic model built on a Lakehouse contains a Salaries table that should be completely invisible to a Reporting role, including not appearing in the field list or in any measure dependency, while the role keeps full access to all other tables. Visuals that reference Salaries must surface no value rather than a blank or filtered result. Which control best meets this requirement?

    • AApply dynamic data masking to every column in the Salaries table so the Reporting role sees masked values whenever a visual references the table.
    • BConfigure object-level security on the semantic model to hide the Salaries table from the Reporting role, so the table and its columns are not exposed to that role. Correct
    • CDefine a row-level security filter on the Salaries table that returns zero rows for the Reporting role while keeping the table present in the model.
    • DUse column-level security in the underlying Lakehouse to deny the Reporting role access to each Salaries column while leaving the table defined in the model.
    Use object-level security on a semantic model to hide an entire table or column from a role, not just filter rows or mask values. Object-level security operates on the structure of the semantic model, removing a table or column from a role's metadata so it does not appear in the field list and cannot be referenced. Row-level security and masking still leave the object defined and visible, and Lakehouse column-level security does not alter the model definition, so only object-level security achieves true object invisibility.

    Why A is wrong: Dynamic data masking still returns the table and its columns, just with obscured values, so Salaries would remain visible in the field list; the requirement is to hide the object entirely, not mask its contents.

    Why B is correct: Object-level security removes an entire table or column from a role's view of the semantic model, so the Reporting role cannot see Salaries in the field list or through dependencies, which is precisely the object-invisibility the requirement describes.

    Why C is wrong: Row-level security filtering to zero rows leaves the Salaries table and its columns visible in the field list and measure dependencies; the table is still there, which contradicts the need to make the object itself invisible.

    Why D is wrong: Column-level security in the Lakehouse governs source column access, but the semantic model still defines the Salaries table, so it stays listed for the role; denying columns can also raise errors rather than cleanly hiding the object.

  2. Ingest and Transform Data

    33% of exam

    What you must be able to do. Choose the right store and engine for a workload, design an incremental and idempotent load with the least overhead, build a conformed dimensional model with correct surrogate keys, and handle duplicate, missing, and late-arriving data correctly.

    In one sentenceThe ingest-and-shape core: the right store and engine, an incremental watermark load, a conformed dimensional model, and explicit handling of dirty and late data.

    Recall check: answer these from memory first
    • A nightly load reads an append-only table with an immutable id and an ever-increasing timestamp and must copy only new rows with low overhead. What load pattern fits, and why not a full reload or source change data capture?
    • UPDATE and DELETE statements run against the Lakehouse SQL analytics endpoint fail. What is the correct conclusion about the store, and where do those row-level corrections belong?
    • Two marts each have their own Product dimension and a report must slice both with one product filter so totals reconcile. What modelling change makes that work?
    • A silver transform must keep records whose Region is null and group them under a single Unknown bucket. How should the missing values be handled?

    What it tests. The engineering heart of the exam: getting data in and shaping it correctly. Designing full and incremental loads with watermarking and change detection; preparing data for a dimensional model with slowly changing dimensions, surrogate keys, and conformed dimensions; loading streaming data and choosing a streaming engine; choosing the right store between Lakehouse, Warehouse, and Eventhouse; choosing between Dataflows Gen2, notebooks, KQL, and T-SQL to transform batch data; creating OneLake shortcuts and mirroring external databases; ingesting with pipelines and transforming with PySpark, SQL, and KQL; and handling duplicate, missing, and late-arriving data.

    How to study it. Spend real time building loads rather than reading about them. For incremental loads, implement a high-water-mark pattern against an append-only source with a monotonic column and learn why it beats truncate-and-reload or source change data capture when overhead must stay low. For modelling, drill conformed dimensions and surrogate keys until they are automatic: one physical Product dimension with shared surrogate keys lets a single slicer drive both a sales fact and a returns fact and makes totals reconcile. Learn the store boundaries cold, the Lakehouse SQL analytics endpoint is read-only so row-level writes go through Spark or a Warehouse, and an Eventhouse with a KQL database is for high-volume event and telemetry data. Practise handling dirty data: substitute an explicit sentinel like Unknown for missing categorical values so records survive and group into one bucket rather than being dropped. Know when a OneLake shortcut leaves data in place versus when a copy or mirror duplicates it.

    Easy to confuse

    • Lakehouse versus Warehouse versus Eventhouse for a workload. A Lakehouse stores Delta files in OneLake with Spark read-write and a read-only T-SQL endpoint, best for files and code-first transforms; a Warehouse is a read-write T-SQL store for set-based SQL and updates and deletes; an Eventhouse with a KQL database is for high-volume streaming and time-series telemetry. Match the store to whether you need files, read-write T-SQL, or event data.
    • OneLake shortcut versus a copy or mirror. A OneLake shortcut points at data where it already lives so there is no duplication and no copy job to maintain; a copy physically duplicates the data into OneLake, and mirroring continuously replicates an external database into OneLake. Use a shortcut when the data should stay in place and a copy or mirror when you need a local, independent replica.
    • High-water-mark watermark versus full truncate-and-reload. A high-water mark stores the maximum value loaded and copies only source rows above it, which suits an append-only source with a monotonically increasing column and keeps overhead low; truncate-and-reload rewrites the whole table every run regardless of what changed. The watermark is the incremental, low-overhead choice.

    Worked example from the DP-700 bank

    Free sampleIngest and Transform Datahard

    A nightly pipeline ingests an append-only sales event table from an on-premises SQL Server source into a bronze Delta table in a Microsoft Fabric Lakehouse. Each event row carries an immutable EventId and an ever-increasing CreatedUtc timestamp, and rows are never updated or deleted at source. The team wants each run to copy only rows added since the previous run while keeping operational overhead low. Which design best meets this requirement?

    • AStore the highest CreatedUtc loaded so far as a high-water mark, then on each run copy only source rows whose CreatedUtc exceeds that stored value and update the mark. Correct
    • BTruncate the bronze Delta table at the start of every run and reload the entire source table so the destination always matches the source exactly.
    • CEnable change data capture on the source table and stream the captured insert, update, and delete records into the bronze Delta table on each run.
    • DCopy the full source each run into a staging table, then MERGE staging into bronze on EventId so unchanged rows are skipped during the upsert step.
    For an append-only source with a monotonically increasing column, use a high-water-mark watermark to load only rows added since the last run. Because rows are only ever inserted and CreatedUtc always increases, the maximum value processed in the previous run uniquely separates old rows from new ones; filtering the source on CreatedUtc greater than the stored mark loads exactly the new rows without reading the whole table or configuring source change tracking.

    Why A is correct: An append-only source with a monotonically increasing timestamp is the textbook case for a high-water-mark watermark; querying rows above the stored mark loads only new rows with minimal overhead and no source-side change tracking.

    Why B is wrong: A full truncate-and-reload guarantees a match but reads the whole source nightly, which scales poorly and ignores the append-only nature; it adds cost and time the stated low-overhead requirement is trying to avoid.

    Why C is wrong: Change data capture is built for sources that change rows; this source is insert-only, so capturing updates and deletes adds source-side configuration and overhead for change types that never occur here.

    Why D is wrong: A MERGE on EventId would avoid duplicates, but it still reads the entire source every run; the expensive full read is exactly what a watermark removes, so this keeps the cost the requirement wants to cut.

  3. Monitor and Optimize an Analytics Solution

    33% of exam

    What you must be able to do. Find the most direct diagnostic artefact for a given failure, read a failed activity's error and metrics without re-running, and choose the right optimisation lever, autoscale, V-Order, compaction, or partitioning, for the stated bottleneck.

    In one sentenceThe keep-it-running core: read the most direct diagnostic artefact for the failure, and apply the optimisation lever that targets the actual bottleneck.

    Recall check: answer these from memory first
    • You need one morning view of overnight runs across pipelines, Dataflows Gen2, and notebooks with status and start time. Which Fabric feature gives that, and why not per-item history?
    • A Spark executor was lost during a skewed wide aggregation. Which monitoring artefact gives the most direct evidence of why the executor died?
    • A fixed large Spark pool sits idle by day but a nightly batch needs the peak. What single configuration keeps nightly performance and cuts daytime waste?
    • A Copy activity failed overnight and you need its exact error and rows-read count without re-running. Where do you look first?

    What it tests. Keeping the solution healthy and fast once it runs. Monitoring ingestion and transformation across Fabric items; monitoring semantic model refresh and configuring alerts; diagnosing and resolving errors in pipelines, Dataflows Gen2, notebooks, Eventhouse, Eventstream, T-SQL, and OneLake shortcuts; and optimising performance, V-Order, file compaction, and partitioning for a Lakehouse table, query and warehouse tuning, and pipeline, Spark, Eventstream, and Eventhouse performance. The recurring theme is reading the right diagnostic artefact and choosing the right optimisation lever for the bottleneck.

    How to study it. Learn where each diagnostic signal lives, because the exam tests precision of source. The Monitoring hub gives one cross-item view of recent pipeline, Dataflow Gen2, and notebook runs with status and start time, ideal for a morning failure sweep. A failed pipeline activity records its error message and rows-read count in the run monitoring detail, readable without re-running. A lost Spark executor is explained by that executor's stderr log and the Spark UI stage and task metrics, not by the driver cell output or the run-level view. Drill the optimisation levers and the bottleneck each one targets: Spark pool autoscale matches node count to demand between a minimum and maximum so a nightly batch scales out while daytime jobs stay small, V-Order and file compaction improve read performance on a Lakehouse table by reducing small files and ordering data, and partitioning helps when queries filter on the partition column. Practise mapping a symptom to the single most direct artefact or lever.

    Easy to confuse

    • Monitoring hub versus per-item run history versus the Capacity Metrics app. The Monitoring hub lists recent runs across pipelines, Dataflows Gen2, and notebooks in one filterable status view, best for spotting overnight failures fast; per-item run history shows one item at a time; the Capacity Metrics app reports capacity consumption, not a cross-item run list. For a cross-item failure sweep, the Monitoring hub is the direct answer.
    • Spark pool autoscale versus dynamic allocation versus a fixed pool. Spark pool autoscale changes the pool node count between a minimum and a maximum as demand changes, so a peak nightly batch scales out while daytime jobs stay small and idle cost drops; dynamic allocation tunes executors within a session but does not shrink a peak-sized pool; a fixed pool reserves its full size whenever attached. Autoscale is the lever that serves both performance and cost.
    • V-Order and file compaction versus partitioning for Lakehouse table optimisation. V-Order and file compaction improve read performance by ordering data and merging many small files into fewer larger ones, helping scans broadly; partitioning splits a table by a column so queries that filter on that column skip irrelevant data. Use compaction and V-Order for the small-files and read-speed problem, and partitioning when queries consistently filter on a partition key.

    Worked example from the DP-700 bank

    Free sampleMonitor and Optimize an Analytics Solutionmedium

    A data engineer manages dozens of scheduled Data Factory pipelines, Dataflows Gen2, and Spark notebooks across a single Microsoft Fabric workspace. Each morning they need one place that lists the recent runs of all of these item types together, with status and start time, so they can quickly spot any that failed overnight without opening each item individually. Which Fabric feature should they use?

    • AOpen the run history on each individual pipeline and notebook in turn, reading the per-item activity output to confirm whether the most recent overnight run completed.
    • BInstall the Microsoft Fabric Capacity Metrics app and read its timepoint detail page to see which scheduled items ran and whether any of them failed overnight.
    • CCreate a Data Activator reflex that watches each pipeline and raises an alert, then review the alert history every morning to learn which runs failed.
    • DOpen the Monitoring hub, which lists recent runs of pipelines, Dataflows Gen2, and notebooks together with their status and start time so failures are visible in one view. Correct
    Use the Monitoring hub to see recent runs of pipelines, Dataflows Gen2, and notebooks together in one status view. The Monitoring hub is the central location in Microsoft Fabric that collects run activity across item types, so a single filterable list shows the status and start time of pipeline, Dataflow Gen2, and notebook runs; this is why it answers the cross-item morning review that per-item history, capacity metrics, or per-item alerts cannot satisfy as directly.

    Why A is wrong: Per-item run history does show that item's runs, but checking each item separately is exactly the manual, item-by-item effort the requirement rules out and gives no single consolidated view.

    Why B is wrong: The Capacity Metrics app reports compute consumption and throttling against a capacity, not a consolidated success or failure list of individual runs, so it is the wrong tool for spotting failed overnight jobs.

    Why C is wrong: Data Activator can alert on conditions, but building and maintaining a reflex per item is heavier than needed and is forward-looking alerting rather than the consolidated run list the engineer asked to review each morning.

    Why D is correct: The Monitoring hub aggregates run activity across item types in one filterable list with status and timing, which is precisely the single cross-item view needed to spot overnight failures quickly.

A study plan that works

  1. Map the blueprint and book a date

    Day 1

    Read the official DP-700 skills outline and the three domains with their weights. Book a provisional date now, because a fixed date turns open-ended study into a plan and is the strongest predictor of actually sitting. Note the three domains carry near-equal weight, so plan to clear all three rather than betting on one, and account for the security and orchestration depth packed into the implement-and-manage domain.

  2. Build the Fabric foundations hands-on

    Week 1

    Stand up a workspace, a Lakehouse, and a Warehouse, and load data into both. Configure the Spark starter pool and a custom pool and feel the cold-start difference. Create a Data Factory pipeline that runs a notebook, and pass a pipeline parameter into the notebook through a tagged parameters cell and base parameters. This first week is about getting the muscle memory the scenario questions assume.

  3. Go deep on ingest and transform

    Weeks 2 to 3

    This is the engineering heart, so it gets heavy time. Implement a high-water-mark incremental load against an append-only source and contrast it with truncate-and-reload. Build a conformed dimension with surrogate keys shared by two facts, and practise SCD handling. Drill the store boundaries until they are automatic, Lakehouse versus Warehouse versus Eventhouse, and the read-only SQL analytics endpoint. Handle duplicate, missing, and late-arriving data, and use a OneLake shortcut versus a copy or mirror correctly.

  4. Master security and orchestration

    Week 4

    Build the security control matrix until object-level versus row-level versus column-level security versus dynamic data masking is automatic, and practise securing the same Lakehouse across both direct file reads and the SQL analytics endpoint with independent controls. Configure OneLake data access roles, sensitivity labels, and item endorsement. Drill the orchestration choice between a Dataflow Gen2, a pipeline, and a notebook, and wire schedules and event-based triggers.

  5. Drill monitoring and optimisation

    Week 5

    Learn where each diagnostic signal lives: the Monitoring hub for cross-item runs, the run detail for a failed activity's error and rows-read count, and the executor stderr log plus Spark UI metrics for a lost executor. Map each optimisation lever to its bottleneck, Spark pool autoscale for peaky load, V-Order and file compaction for small files and read speed, and partitioning for filtered queries. Practise turning a symptom into the single most direct artefact or lever.

  6. Drill weak domains, then space the review

    Week 6

    Use your per-domain accuracy to attack the domains dragging you down rather than re-reading what you already know. Then space it: revisit each domain's recall prompts after a few days and again a week later. Spacing roughly doubles what sticks compared with cramming the night before, and the store-and-engine judgement calls need repetition to become reflex.

  7. Sit a timed mock and calibrate

    Weeks 6 to 7

    Take at least one full timed mock under exam conditions to rehearse pacing and the flag-and-return habit. Treat the score as a per-domain readiness signal, not a single number, and review every missed question, naming the exact workload fit or documented rule you misread, before you book or sit.

Know when you're ready

Readiness for DP-700 is a measured score on scenario questions you have not seen before, not a feeling that Microsoft Fabric is familiar. Those are different things, and the gap is where people fail. Clicking through a workspace all day builds fluency, and fluency feels like knowledge, so confidence rises while precise recall and judgement do not. The fix is to test yourself: if you can read a fresh scenario, name the read and write pattern and the constraint, pick the option whose store, engine, and load pattern fit the workload, and explain why each other option is wrong, you know it; if you can only nod along to an explanation, you do not yet.

Be especially wary of early confidence on the store-and-engine judgement calls and the security layers. Knowing what a Lakehouse or a security control is feels like enough, but the exam tests the exact distinction, Lakehouse versus Warehouse versus Eventhouse, object-level versus row-level security, a high-water mark versus a full reload, and those are the items people drop. Trust your measured per-domain accuracy over your gut, and set the bar at clearing every one of the three domains comfortably on unseen questions across more than one session before you book.

Ready to put this into practice?

Free DP-700 questions with worked explanations. No sign-up.

Practise DP-700 free

Exam-day tips

  • Read the scenario for the read and write pattern and the stated constraint first, then match it to the Microsoft Fabric workload. Distractors are written to sound reasonable; the right answer is the store, engine, and load pattern that fit, not the most powerful tool on offer.
  • When a question stresses low operational overhead or incremental only, choose the incremental and idempotent pattern. A high-water-mark watermark beats a truncate-and-reload for an append-only source with a monotonic column.
  • When a question stresses maintainability, choose the lowest-code option that still meets the requirement: a Dataflows Gen2 transform over a hand-written notebook when the logic is simple, and a Data Factory pipeline to sequence and parameterise the rest.
  • Match the security control to what must be hidden: object-level security to remove a table from a role, row-level security to filter rows, column-level security or dynamic data masking to obscure values. They are not interchangeable.
  • Remember the store boundaries: the Lakehouse SQL analytics endpoint reads but does not write, so route UPDATE and DELETE through Spark or a Warehouse, and reach for an Eventhouse and a KQL database for high-volume streaming telemetry.
  • For diagnosis, go to the most direct artefact: the Monitoring hub for a cross-item run sweep, the failed activity's run detail for its error and rows-read count, and the executor stderr log plus Spark UI metrics for a lost executor.
  • Secure each access path explicitly: OneLake folder roles govern direct file reads and SQL analytics endpoint permissions govern endpoint queries, and they do not inherit each other.
  • Flag and move on. Cover every question once before you sink time into a hard one, so you collect the clear marks first and protect the items you actually know.

Frequently asked questions

Is the DP-700 hard?

It is an intermediate, associate-level exam, and the difficulty is judgement plus precision rather than breadth. You have to pick the right Microsoft Fabric store and engine for a read and write pattern and recall a handful of documented behaviours exactly, like the read-only Lakehouse SQL analytics endpoint or what object-level security hides. Scenario practice with worked explanations matters far more than re-reading feature lists.

How long should I study for the DP-700?

Most candidates with real data-engineering experience are ready in five to seven weeks of steady study. Less hands-on Fabric time means more weeks on the ingest-and-transform domain and on the security and orchestration depth in the implement-and-manage domain, which is where people who have only used the platform lightly tend to lose marks.

Do I need SQL and coding experience for this exam?

Yes. DP-700 expects genuine data-engineering and SQL experience, and you transform data with PySpark, T-SQL, and KQL across the exam. You need to read and reason about set-based SQL, Spark, and KQL queries and choose between them for a given transform. You do not need to be an expert in all three, but you cannot pass while avoiding code entirely.

How much of the exam is choosing the right store or engine?

A large share. The exam repeatedly asks you to choose between a Lakehouse, a Warehouse, and an Eventhouse, and between Dataflows Gen2, notebooks, KQL, and T-SQL. Learn the boundaries cold: files and code-first transforms point at a Lakehouse, read-write set-based T-SQL at a Warehouse, and high-volume telemetry at an Eventhouse with a KQL database. These judgement calls are the core skill the exam measures.

Do I need to know the medallion architecture and incremental loads?

Yes. The ingest-and-transform domain leans on the medallion architecture, bronze through silver to gold, and on incremental loading with watermarking and change detection. You should be able to design a high-water-mark load for an append-only source, handle slowly changing dimensions and surrogate keys, and deal with duplicate, missing, and late-arriving data without dropping valid records.

Which domains should I focus on?

All three carry near-equal weight, so none can be left short. The implement-and-manage domain is the largest and packs in both the security layers and the orchestration choices, so give it solid time. The ingest-and-transform domain is the engineering heart and rewards hands-on practice. The monitor-and-optimise domain is smaller but precise about which artefact and which lever to use, so do not skim it.

How is the exam scored and what should I aim for?

It is scored on a scale and reported as a pass or fail against the published bar, shown in the facts panel above. Because individual question weights are not visible, aim to clear every one of the three domains comfortably on unseen practice questions rather than chasing one raw figure, and confirm your store-and-engine judgement holds up across more than one session.

How many practice questions should I do before booking?

Enough that every domain clears comfortably on questions you have not seen and a full timed mock feels comfortable on pacing. Quality of review beats raw volume: on every question, read the explanation and name the workload fit or documented rule that picked the answer, including on the ones you got right, because guessing right is not the same as knowing why.

Is the DP-700 Fabric Data Engineer worth it?

It is well suited to data engineers and ETL developers who are actively building on Microsoft Fabric and want a credential that demonstrates they can design ingestion patterns, secure data across multiple access paths, and keep pipelines healthy in production. The exam is grounded enough in real engineering decisions - store choice, incremental load patterns, security layering - that preparation tends to sharpen practical skills rather than just exam technique. Those who want to round out their Microsoft Fabric credentials may pair it with the DP-600, which covers the analytics engineering and semantic modelling side of the same platform.

Examworthy is not affiliated with or endorsed by Microsoft. This guide is original study material based on the public exam blueprint. We never reproduce live exam items. DP-700 and related marks belong to their respective owners.