National AI Awards 2025Discover AI's trailblazers! Join us to celebrate innovation and nominate industry leaders.

Nominate & Attend

Platform engineer, MLOps (UK)

writer.com
London
1 week ago
Create job alert

About this role
As a Platform engineer, MLOps, you will be critical to deploying and managing cutting-edge infrastructure crucial for AI/ML operations, and you will collaborate with AI/ML engineers and researchers to develop a robust CI/CD pipeline that supports safe and reproducible experiments. Your expertise will also extend to setting up and maintaining monitoring, logging, and alerting systems to oversee extensive training runs and client-facing APIs. You will ensure that training environments are optimally available and efficiently managed across multiple clusters, enhancing our containerization and orchestration systems with advanced tools like Docker and Kubernetes.
This role demands a proactive approach to maintaining large Kubernetes clusters, optimizing system performance, and providing operational support for our suite of software solutions. If you are driven by challenges and motivated by the continuous pursuit of innovation, this role offers the opportunity to make a significant impact in a dynamic, fast-paced environment.
????️ Your responsibilities:
Work closely with AI/ML engineers and researchers to design and deploy a CI/CD pipeline that ensures safe and reproducible experiments.

Set up and manage monitoring, logging, and alerting systems for extensive training runs and client-facing APIs.

Ensure training environments are consistently available and prepared across multiple clusters.

Develop and manage containerization and orchestration systems utilizing tools such as Docker and Kubernetes.

Operate and oversee large Kubernetes clusters with GPU workloads.

Improve reliability, quality, and time-to-market of our suite of software solutions

Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement

Provide primary operational support and engineering for multiple large-scale distributed software applications

️ Is this you?
You have professional experience with:
Model training

Huggingface Transformers

Pytorch

vLLM

TensorRT

Infrastructure as code tools like Terraform

Scripting languages such as Python or Bash

Cloud platforms such as Google Cloud, AWS or Azure

Git and GitHub workflows

Tracing and Monitoring

Familiar with high-performance, large-scale ML systems

You have a knack for troubleshooting complex systems and enjoy solving challenging problems

Proactive in identifying problems, performance bottlenecks, and areas for improvement

Take pride in building and operating scalable, reliable, secure systems

Are comfortable with ambiguity and rapid change

Preferred skills and experience:
Familiar with monitoring tools such as Prometheus, Grafana, or similar

5+ years building core infrastructure

Experience running inference clusters at scale

Experience operating orchestration systems such as Kubernetes at scale
Benefits & perks (UK full-time employees):
Generous PTO, plus company holidays

Comprehensive medical and dental insurance

Paid parental leave for all parents (12 weeks)

Fertility and family planning support

Early-detection cancer testing

through Galleri

Competitive pension scheme and company contribution

Annual work-life stipends for:
Home office setup, cell phone, internet

Wellness stipend for gym, massage/chiropractor, personal training, etc.

Learning and development stipend

Company-wide off-sites and team off-sites

Competitive compensation and company stock options

#LI-Remote

#J-18808-Ljbffr

Related Jobs

View all jobs

Senior ML Platform Engineer - Artificial Intelligence | London, UK

Senior MLOps Engineer

MLOps Engineer...

Lead Engineer - MLOps

▷ Only 24h Left! MLOps Engineer...

Lead Engineer - MLOps...

National AI Awards 2025

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

AI Jobs UK 2025: 50 Companies Hiring Now

Bookmark this guide – we refresh it every quarter so you always know who’s really scaling their artificial‑intelligence teams. Artificial intelligence hiring has roared back in 2025. The UK’s boosted National AI Strategy funding, record‑breaking private investment (£18.1 billion so far) & a fresh wave of generative‑AI product launches mean employers are jockeying for data scientists, ML engineers, MLOps specialists, AI product managers, prompt engineers & applied researchers. Below are 50 organisations that have advertised UK‑based AI vacancies in the past eight weeks or formally announced growth plans. They’re grouped into five easy‑scan categories so you can jump straight to the kind of employer – & culture – that suits you. For each company you’ll find: Main UK hub Example live or recent vacancy Why it’s worth a look (tech stack, culture, mission) Use the internal links to browse current vacancies on ArtificialIntelligenceJobs.co.uk – or set up a free job alert so fresh roles land in your inbox.

Return-to-Work Pathways: Relaunch Your AI Career with Returnships, Flexible & Hybrid Roles

Stepping back into the workplace after a career break can feel like embarking on a whole new journey—especially in a cutting-edge field such as artificial intelligence (AI). For parents and carers, the challenge isn’t just refreshing your technical know-how but also securing a role that respects your family commitments. Fortunately, the UK’s tech sector now boasts a wealth of return-to-work programmes—from formal returnships to flexible and hybrid opportunities. These pathways are designed to bridge the gap, equipping you with refreshed skills, confidence and a supportive network. In this comprehensive guide, you’ll discover how to: Understand the booming demand for AI talent in the UK Leverage transferable skills honed during your break Overcome common re-entry challenges Build your AI skillset with targeted training Tap into returnship and re-entry programmes Find flexible, hybrid and full-time AI roles that suit your lifestyle Balance professional growth with caring responsibilities Master applications, interviews and networking Whether you’re returning after maternity leave, eldercare duties or another life chapter, this article will equip you with practical steps, resources and insider tips.

LinkedIn Profile Checklist for AI Jobs: 10 Tweaks That Triple Recruiter Views

In today’s fiercely competitive AI job market, simply having a LinkedIn profile isn’t enough. Recruiters and hiring managers routinely scout for top talent in machine learning, data science, natural language processing, computer vision and beyond—sometimes before roles are even posted. With hundreds of applicants vying for each role, you need a profile that’s optimised for search, speaks directly to AI-specific skills, and showcases measurable impact. By following this step-by-step LinkedIn for AI jobs checklist, you’ll make ten strategic tweaks that can triple recruiter views and position you as a leading AI professional. Whether you’re a fresh graduate aiming for your first AI position or a seasoned expert targeting a senior role, these actionable changes will ensure your profile stands out in feeds, search results and recruiter queues. Let’s dive in.