Jobs

Senior HPC Engineer (Multi-GPU/TPU)


Job details
  • European Tech Recruit
  • London
  • 1 week ago

Senior HPC Engineer


This position would be joining a leading start-up company at the forefront of LLMs & AI Safety, working on Machine Learning & HPC Engineering. This company are dedicated to engineering cutting-edge AI systems poised to revolutionize industries worldwide.


As an HPC Engineer, you will play a crucial role in developing a robust framework for rapid training and experimentation of large language models on multi-GPUs. You will develop the core inference engine to seamlessly deploy large machine learning models to customers at scale and across distributed systems, contributing significantly to the automated pipeline, optimizing for high throughput training runs and rapid experimentation while achieving top hardware efficiency.


Qualifications:

We are seeking candidates with exceptional ML engineering evidenced by:

  • Experience in creating and managing high-performance computing clusters across GPU/TPU, preferably in PyTorch.
  • Proficiency in efficient serving of large machine learning models at scale, including quantization and distributed computing, leveraging libraries such as deepspeed.
  • Strong software engineering acumen with expertise in software design/architecture, particularly in Python.
  • Any cloud experience working with AWS, GCP or Azure is a plus.
  • Understanding of the latest AI research and ability to efficiently implement these systems.
  • Prior experience at a leading machine learning company (OpenAI, DeepMind, Meta, Anthropic, HuggingFace, etc.).


Key Words:Machine Learning / LLM / Large Language Model / PyTorch / High Performance Computing / HPC / GPU / TPU / Deepspeed / AI / OpenAI / Distributed Systems


By applying to this role, you understand that we may collect your personal data and store and process it on our systems. For more information please see our Privacy Notice https://eu-recruit.com/wp-content/uploads/2020/12/Privacy-Notice.pdf

Sign up for our newsletter

The latest news, articles, and resources, sent to your inbox weekly.

Similar Jobs

HPC Systems Specialist – Senior Systems Administrator

Grade UE07: £40,247 to £47,874 per annum.College of Science & Engineering / EPCC.Fixed Term Contract - Temporary - 2 Years - With strong likelihood of extension.Full Time - 35 Hours Per Week.The Opportunity:An opportunity has arisen to join the fantastic team of system administrators and infrastructure specialists at EPCC, the...

The University of Edinburgh Midlothian

HPC Systems Specialist – Senior Systems Administrator

Grade UE07: £40,247 to £47,874 per annum.College of Science & Engineering / EPCC.Fixed Term Contract - Temporary - 2 Years - With strong likelihood of extension.Full Time - 35 Hours Per Week.The Opportunity:An opportunity has arisen to join the fantastic team of system administrators and infrastructure specialists at EPCC, the...

The University of Edinburgh Edinburgh

Senior Machine Learning Engineer

Machine Learning EngineerFuture AI Unicorn - backed by the best name in AIOffice (x4 days in): This position will be based at the Client’s HQ in Central London, 4 days per-week in office and can offer full sponsorship visas & international relocation support as a UK AI Futures partner).Amberes has...

Amberes London

Senior Research Software Engineer and Manager - IT Services - 99739 - Grade 8

Description Position Details IT Services Location: University of Birmingham, Edgbaston, Birmingham UK Full time starting salary is normally in the range £46,485 to £55,295 with potential progression once in post to £62,098 Grade: 8 Full Time, Permanent Closing date: 7th November 2024 Background Achieving the global ambitions of our University...

Chemical Engineering UK

Senior AI Infra Engineer, AI/ML and Data Infrastructure

The Chan Zuckerberg Initiative was founded by Priscilla Chan and Mark Zuckerberg in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education to addressing the needs of our local communities. Our mission is to build a more inclusive, just, and healthy future for...

Chan Zuckerberg Initiative London

Research Computing Platforms Engineer

The UniversityAt Durham University we are proud of our people. A globally outstanding centre of educational excellence, a collegiate community of extraordinary people, a unique and historic setting - Durham is a university like no other. We believe that inspiring our people to do outstanding things at Durham enables Durham...

Durham University Durham