AI Engineer Interview Questions UK 2026

9 min read

Thirty real AI engineer interview questions from UK employers in 2026 — DeepMind, Anthropic, Wayve and the banks — with model-answer outlines and structure.

The Short Answer

The UK AI engineer interview process in 2026 typically runs four to five stages over three to six weeks: a recruiter screen, a coding or take-home assessment, one or two technical deep-dives, an ML system design round, and a behavioural or culture interview. Below we cover thirty questions our candidates have actually faced this year, grouped by stage — coding and implementation, ML system design, ML fundamentals, LLM-specific scenarios, and behavioural. Top UK employers in this space include Google DeepMind, Anthropic's London office, Wayve, Faculty AI and Cohere, alongside the bank AI labs at HSBC, Barclays and Lloyds. A strong candidate typically combines PyTorch fluency, a working knowledge of transformer internals, the ability to reason about latency and cost trade-offs in production LLM systems, and a track record of shipping something — ideally with metrics attached. Successful candidates this year are typically taking packages of £130,000–£250,000 in London, with research roles at frontier labs running higher.

How UK AI Engineer Interview Processes Are Structured in 2026

The typical UK AI engineer loop in 2026 has stabilised around five stages, though the order and weight vary by employer.

  1. Recruiter screen (30 minutes). Motivation, package expectations, visa status, notice period. Frontier labs increasingly ask about safety views here too.

  2. Technical screen or take-home (60–120 minutes live, or 4–8 hours async). Either a live coding call — typically PyTorch or NumPy implementation rather than pure leetcode — or a take-home such as fine-tuning a small model on a provided dataset and writing it up.

  3. ML deep-dive (60 minutes). A senior engineer probes your knowledge of model internals: attention, optimisers, distributed training, evaluation. Expect to be asked to derive things on a shared whiteboard.

  4. ML system design (60 minutes). Design a RAG system, a recommendation pipeline, or an evaluation harness for an agent. The bar is reasoning about latency, cost, failure modes and monitoring — not just drawing boxes.

  5. Behavioural and culture (45–60 minutes). Past projects, conflict, ambiguity, and — at safety-focused employers — your views on responsible deployment.

DeepMind and Anthropic often add a research discussion round; the banks usually add a stakeholder interview. End-to-end the process typically takes three to six weeks in London and Cambridge.

The 30 Questions, Grouped by Stage

Coding and Implementation (six questions)

1. Implement scaled dot-product attention in PyTorch from scratch. Model answer: compute QK^T, scale by sqrt(d_k), apply the causal mask before softmax, multiply by V. Mention numerical stability of softmax and where you would use torch.nn.functional.scaled_dot_product_attention in production.

2. Write a custom PyTorch Dataset and DataLoader for streaming a 200GB JSONL file. Discuss iterable datasets, worker sharding, pin_memory, and how you would avoid replaying the same shard across workers.

3. Implement top-k and top-p (nucleus) sampling. Walk through sorting logits, cumulative softmax, masking, and renormalising. Mention temperature and why greedy decoding fails for creative tasks.

4. Given a list of token IDs, build a function that batches them into fixed-length sequences with packing. Discuss padding vs packing, attention mask construction, and the impact on throughput.

5. Debug this training loop that produces NaN losses after 200 steps. Typical answer: check learning rate, gradient clipping, mixed-precision loss scaling, division-by-zero in custom losses, and whether the data contains malformed inputs.

6. Implement a simple LRU cache for prompt prefixes. Useful for KV cache discussions later in the loop. Mention collections.OrderedDict or a doubly-linked list plus hashmap.

ML System Design (six questions)

7. Design a RAG system for a UK bank's customer support. Cover document ingestion, chunking strategy, embedding model choice, vector store (FAISS, pgvector, or managed), retrieval with hybrid BM25 plus dense, reranking, prompt template, guardrails, and an evaluation set. Discuss FCA traceability requirements.

8. Design an evaluation harness for an LLM agent that books trains. Talk about offline golden sets, LLM-as-judge with bias mitigation, replayable browser traces, and online A/B with safety metrics.

9. Design a feature store for a fraud detection model serving 50,000 requests per second. Cover online vs offline parity, point-in-time correctness, and a sub-50ms latency budget.

10. Design a system to fine-tune a 70B model on customer data without leaking PII. Cover differential privacy, PII redaction pipelines, LoRA vs full fine-tune trade-offs, and per-tenant adapters.

11. Design a recommendation system for a streaming platform with a cold-start problem. Two-tower retrieval, embedding-based content fallback, multi-armed bandit for exploration.

12. Design a monitoring system for a production LLM. Discuss latency percentiles, token cost per request, drift detection on input distribution, hallucination flagging via citation checking, and user feedback loops.

ML Fundamentals (six questions)

13. Explain gradient checkpointing and when you would use it. Trade compute for memory by recomputing activations during the backward pass. Useful when training large models on limited VRAM; typically a 20–30% slowdown for 60–70% memory saving.

14. Walk through how AdamW differs from Adam. Decoupled weight decay applied directly to weights rather than through the gradient. Mention why this matters for transformers and the typical hyperparameters.

15. What is the difference between layer norm and RMS norm? RMS norm drops the mean centring; it is faster and used in LLaMA-family models. Discuss numerical behaviour.

16. Explain how rotary position embeddings (RoPE) work. Rotate query and key vectors by a position-dependent angle in pairs of dimensions. Mention why this generalises better than absolute embeddings.

17. What is the difference between data, tensor, pipeline and FSDP parallelism? A short table answer. Mention when you would combine them — typically data plus FSDP for most fine-tuning today.

18. How would you evaluate whether a model is overfitting on a small fine-tuning dataset? Train-validation loss gap, held-out probes, evaluating on adjacent capabilities to check for catastrophic forgetting.

LLM-Specific Scenarios (six questions)

19. How would you reduce hallucination in a production agent? Retrieval grounding with citations, constrained decoding for structured outputs, self-consistency sampling, post-hoc verification with a second model, and clear refusal training.

20. A customer says the chatbot is "leaking" training data. How do you investigate? Reproduce the prompt, check whether it is regurgitation or confabulation, measure with canaries, and discuss membership inference if relevant.

21. Design a prompt evaluation pipeline that costs less than £500 per release. Sampling strategy, judge-model choice, caching previous evaluations, and use of cheap classifiers as first-pass filters.

22. When would you choose fine-tuning over prompting? Volume of examples available, latency budget, format consistency, cost per token at scale. Typically over 1,000 high-quality examples and clear format requirements.

23. How does speculative decoding work? A small draft model proposes tokens; the larger model verifies in parallel. Discuss acceptance rates and typical 2–3x speedups.

24. Walk through KV caching and its memory footprint for a 70B model at 8k context. Calculate roughly: 2 (K,V) num_layers num_heads head_dim seq_len batch dtype_size. Mention paged attention and vLLM.

Behavioural and Culture (six questions)

25. Tell me about a time you shipped an ML system that failed in production. Strong answers name the failure mode (distribution shift, label noise, edge case), describe how you detected it, and what you changed about your process — not just the system.

26. Describe a time you disagreed with a colleague about a modelling approach. Focus on how you ran a cheap experiment to resolve it rather than arguing on priors.

27. Why do you want to work on safety / on capabilities / at a bank? Tailored per employer. Anthropic and DeepMind will probe your views on alignment seriously; banks want to hear about responsible deployment in regulated contexts.

28. Tell me about a paper you read recently that changed how you think. Pick something from the last three months, summarise the claim, and — crucially — say what you would do differently because of it.

29. How do you decide what to work on when given an ambiguous problem? Look for evidence of structured triage: cheapest experiment that disconfirms the riskiest assumption first.

30. Where do you want to be in three years? Concrete is better than grand. "I want to be the person who owns evaluation for a production agent" lands better than "I want to lead an AI org."

What Top UK AI Employers Specifically Look For

  • Google DeepMind (London, King's Cross). Expect a research-flavoured loop with at least one paper discussion. Engineers are pushed on distributed training, JAX, and the ability to read and critique recent work. Packages typically range £180,000–£400,000+ for senior research engineers.

  • Anthropic (London). Heavy emphasis on safety thinking, mechanistic interpretability familiarity, and honest reasoning under uncertainty. Strong written-communication bar. Compensation is among the highest in the UK market.

  • Wayve (London). Self-driving end-to-end models. Expect questions on multimodal architectures, video data pipelines, simulation and real-world evaluation. Strong PyTorch and CUDA bias.

  • Faculty AI (London). Applied consulting-style work across regulated sectors. The loop emphasises stakeholder reasoning, evaluation rigour and the ability to scope a project under cost constraints.

  • Cohere (London). Production LLM serving and enterprise RAG. Expect deep questions on inference optimisation, retrieval quality and multilingual evaluation.

  • HSBC AI Labs, Barclays, Lloyds. Bank AI labs in London, Edinburgh and increasingly Manchester emphasise model risk management, explainability, FCA and PRA compliance, and pragmatic deployment over frontier research. Packages typically £110,000–£180,000 with significant bonus.

Frequently Asked Questions: AI Engineer Interviews UK

How long does interview prep typically take?

Candidates we speak to typically spend four to eight weeks of focused prep alongside a current job, longer if they are pivoting from software engineering into ML. A reasonable split is 40% implementing things in PyTorch, 30% system design practice, 20% paper reading, 10% behavioural rehearsal.

What's the typical take-home assessment?

Usually a small fine-tuning or evaluation task with a write-up: "fine-tune a small model on this dataset, report metrics, discuss what you would do with more compute." Expected effort is typically four to eight hours. The write-up is often weighted more heavily than the code itself.

Do they ask leetcode-style questions?

Less than tier-one tech companies, but not zero. The banks and Faculty AI are most likely to include a leetcode-style round. DeepMind and Anthropic typically prefer ML-flavoured implementation problems — write attention, write a sampler, write a custom loss — over pure algorithmic puzzles.

How important is paper-reading?

Very important at DeepMind, Anthropic, Cohere and Wayve; less so at the banks. A reasonable cadence is two to three papers per week, with at least one you can discuss in depth in any given interview. Pick papers relevant to the team you are interviewing with.

Do they hire without PhDs?

Yes. DeepMind and Anthropic hire strong engineers without PhDs, particularly for engineering-leaning roles. The signal they look for is equivalent depth — typically shown through serious open-source work, published evaluations, or a track record of shipping production ML. PhDs remain more common in pure research roles.

What's the typical salary outcome?

In London and Cambridge in 2026, successful candidates typically take packages of £130,000–£250,000 for senior AI engineer roles, with frontier labs and staff-level positions reaching £300,000–£500,000+ once equity is included. Bank AI lab roles typically sit at £110,000–£180,000 with cash bonuses. Remote-friendly roles outside London typically pay 10–20% less.

Summary

UK AI engineer interviews in 2026 are demanding but predictable: five stages, a stable set of question patterns, and clear differences between frontier labs, scale-ups and bank AI labs. The candidates who do best treat preparation as a portfolio exercise — implementation fluency, system design reasoning, paper familiarity and honest behavioural stories — rather than grinding any single axis. Tailor your prep to the employer: research framing for DeepMind and Anthropic, production rigour for Cohere and Wayve, regulated-deployment thinking for the banks. Start practising on a shared whiteboard early; the format catches more people out than the content does.

Looking for your next AI engineering role? Browse current openings at artificialintelligencejobs.co.uk.

Related Jobs

Spotlight
Hybrid Permanent

Forward Deployed Engineer

The Forward Deployed Engineer role involves working directly with enterprise customers to understand their operational challenges, rapidly prototyping solutions, and delivering immediate value. You will embed within customer organizations, adapt to diverse tech stacks, and translate learnings into product improvements.

SolveAI logo

SolveAI

London, United Kingdom

Spotlight

Senior ML Compiler Engineer

At Fractile, we’re taking a revolutionary approach to computing to run the world’s largest language models 100x faster than existing systems. Our fast-growing team is working at the cutting edge of the latest AI developments...

Fractile logo

Fractile

Bristol, United Kingdom

Hybrid Permanent

AI Engineer

This is a fantastic opportunity to join Luminance, the pioneer of Legal-Grade™ AI for enterprise. Backed by internationally renowned VCs and named in both the Forbes AI 50 list of ‘Most Promising Private AI Companies...

Luminance logo

Luminance

Cambridge, United Kingdom

Hybrid Contract

AI Engineer

This role involves designing, building, and deploying end-to-end AI/ML and Generative AI solutions for production environments. Key responsibilities include working on Agentic AI solutions, RAG pipelines, LLM integrations, and cloud-native AI architecture, collaborating closely with engineering, data, and business stakeholders.

Experis logo

Experis

Butley Town, Cheshire, SK10 4FZ, United Kingdom

€68,698 – €69,568 pa On-site Permanent

AI Engineer

This role involves developing and deploying AI applications, analyzing model performance, and working with multicore techniques. You will be part of a team focused on AI acceleration, using frameworks like PyTorch and TensorFlow, and tackling complex problems in AI model deployment and optimization.

MicroTech Consulting

Barcelona, PL13 2JU, United Kingdom

£60,000 – £70,000 pa Remote Permanent Flexible

AI Engineer

This role involves extending and improving a 22-agent agentic AI platform that connects fragmented school data and surfaces actionable insights. Responsibilities include building and refining agent sub-graphs, improving prompt engineering, and contributing to LLM benchmarking. The platform integrates multiple LLMs and uses LangGraph with a supervisor-of-supervisors architecture.

Opus Recruitment Solutions

London, United Kingdom

£75,000 – £82,000 pa Hybrid Permanent

AI Engineer

This role involves building and deploying production AI applications using LLMs, designing agentic workflows, and developing scalable backend APIs and cloud-native services. You'll work closely with engineering, product, and business teams to improve automation, decision-making, and user experience through intelligent workflows and AI-powered features.

Yolk Recruitment

Cardiff, South Glamorgan, CF10 2AF, United Kingdom

£60,000 – £70,000 pa Hybrid Permanent

AI Engineer

This role involves building and improving AI features using Python, creating proof of concepts, and integrating AI with the front end. The focus is on practical, real-world applications of AI, particularly in a regulated financial services environment, with an emphasis on security, testing, and DevOps practices.

SF Partners

Birmingham, West Midlands (county), United Kingdom

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Further reading

Dive deeper into expert career advice, actionable job search strategies, and invaluable insights.

Hiring?
Discover world class talent.