The Data Engineer Interview Questions You Will Actually Get in the UK (2026, with Honest Answers)

Last updated: June 2026

Most lists of Data Engineer interview questions are padded with trivia nobody asks and definitions you could recite without understanding. This one is different. These are the questions UK employers actually use in 2026, grouped by the stage of the process they show up in, with answers that show what interviewers are really listening for.

The pattern matters as much as the questions. A UK Data Engineer interview usually runs as a SQL screen, then a technical deep dive on pipelines and modelling, then a system design discussion, and finally a behavioural round. Get a feel for all four and you stop being surprised. If you are earlier in the journey, our Data Engineer roadmap covers how to build the skills these questions test.

The SQL Screen

This is the gate. Fail it and the rest does not happen. UK screens test production-level SQL, not textbook syntax.

1. Write a query to find the second highest salary per department.

They want window functions. Reaching for ROW_NUMBER() or DENSE_RANK() partitioned by department shows you think in sets, not loops. A subquery with MAX that only handles the global case quietly tells them you have not written much real SQL.

2. You have duplicate rows in a table. How do you remove them?

Talk through identifying duplicates with a window function ranking on the key, then deleting or filtering everything beyond the first. The honest follow-up they are listening for: why did the duplicates appear, and how do you stop them upstream. Deduplication is a symptom, not a cure.

3. What is the difference between a left join and a left anti join, and when would you use each?

Anti joins come up constantly in data work, usually for finding records that failed to match - such as orders with no customer. If you can explain when you would use NOT EXISTS over NOT IN and why nulls break the latter, you are ahead of most candidates.

If your SQL is not yet at this level, that is the first gap to close before applying. Our SQL beginner guide builds the foundation these questions sit on.

The Data Modelling Round

This is where interviewers find out whether you have actually shipped a warehouse or only read about one.

4. Explain slowly changing dimensions. When would you use Type 1 versus Type 2?

This is the single most revealing modelling question in 2026. Type 1 overwrites and keeps no history. Type 2 adds a new row and preserves it, which you need when the business has to report on what was true at a point in time. Candidates who have built warehouses explain the trade-off without hesitating. Candidates who have not tend to blur the two.

5. Walk me through how you would model orders, customers, and products for analytics.

They want dimensional modelling: a fact table for orders, dimension tables for customers and products, sensible grain, and clear keys. The strong answer names the grain first, because choosing the grain wrong is how warehouses quietly rot. Most data quality problems are design problems wearing a technical costume, and this question is where that shows.

6. What is the difference between ETL and ELT, and which do you default to?

ELT is the 2026 standard: load raw data into the warehouse, then transform inside it with SQL and dbt, because cloud compute is cheap and keeping the raw layer gives you a safety net. Knowing why the industry shifted - not just the acronym order - is the point.

The Pipeline and Tooling Deep Dive

7. How do you make a pipeline idempotent, and why does it matter?

Idempotency means rerunning the same job produces the same result rather than duplicating data. This is a senior signal even in junior interviews. Talk about deterministic keys, merge or upsert logic, and partition overwrites. It tells them you have lived through a failed run that needed replaying.

8. A pipeline that normally finishes in 20 minutes has been running for two hours. Walk me through your debugging.

They are testing how you think under pressure, not whether you know the answer. Start with what changed: data volume, a schema change upstream, a skewed join, a resource bottleneck. Check the logs and the orchestrator first. The worst answer is jumping straight to a fix without diagnosing.

9. How do you test and monitor data quality in production?

Name concrete things: dbt tests for not null, uniqueness and referential integrity, freshness checks, row count anomalies, and alerting that fails loudly. The honest version acknowledges that silent failures are the dangerous ones - a pipeline that breaks visibly is far less costly than one that quietly produces wrong numbers for a month.

The System Design Round

In 2026 these questions have moved on from “design a batch warehouse” toward streaming and cost.

10. Design a pipeline that ingests millions of events a day and serves them for analytics.

There is no single right answer, and that is the point. Talk through ingestion, choosing batch or streaming and why, storage and modelling, transformation, orchestration, and the parts everyone forgets: cost, monitoring, and what happens when a component fails. Interviewers want to hear you reason about trade-offs, not recite one architecture.

11. How would you control cost on a cloud data platform?

A grown-up question for a market where cloud bills are under scrutiny. Partitioning and clustering, avoiding full table scans, right-sizing warehouses, killing runaway queries, and dropping data nobody uses. Showing you treat compute as money rather than a free resource sets you apart.

The Behavioural Round

12. Tell me about a time a pipeline you built broke in production. What happened?

They are not looking for a flawless record. They are looking for ownership. Describe what broke, how you found it, how you fixed it, and crucially what you changed so it could not happen the same way again. Honesty about failure reads as competence. A claim that nothing has ever broken reads as inexperience.

The Honest Summary

UK Data Engineer interviews reward depth over breadth and ownership over buzzwords. Strong production SQL gets you through the door, clear data modelling proves you have shipped real work, and calm reasoning about pipelines, system design, and cost gets you the offer.

Prepare for the four stages, be honest about what has broken on your watch, and you will outperform candidates with longer tool lists and shallower understanding. For what the role pays once you are in, see our UK Data Engineer salary guide.

If you want a structured route through the exact skills these questions test, our 18-week Data Engineering programme is built for people learning around a job. Not sure it fits? Take the 4-minute course quiz first.

Frequently asked questions

What is the hardest part of a Data Engineer interview?

For most candidates it is the system design round, because there is no single correct answer and it rewards reasoning about trade-offs rather than memorised facts.

How much SQL do I need for a Data Engineer interview?

Production level: window functions, anti joins, deduplication, and slowly changing dimensions. Textbook syntax alone is not enough.

Do junior Data Engineer interviews include system design?

Often a lighter version, yes. They are less about a perfect architecture and more about whether you can reason about ingestion, storage, and failure.

What do behavioural rounds test for Data Engineers?

Ownership. How you handle a pipeline that broke, what you learned, and what you changed so it would not happen again.