About the AI Security Institute
The AI Security Institute is the world's largest and best-funded team dedicated to understanding advanced AI risks and translating that knowledge into action. We’re in the heart of the UK government with direct lines to No. 10 (the Prime Minister's office), and we work with frontier developers and governments globally.
We’re here because governments are critical for advanced AI going well, and UK AISI is uniquely positioned to mobilise them. With our resources, unique agility and international influence, this is the best place to shape both AI development and government action.
About the Team
The Cyber and Autonomous Systems Team (CAST) is looking to research and map the evolving frontier of AI capabilities and propensities to inform critical security decisions that reduce loss-of-control risks from frontier AI. We focus on preventing harms from high-impact cybersecurity capabilities and highly capable autonomous AI systems.
Our team is a blend of high-velocity generalists and technical staff, from organisations such as Meta, Amazon, Palantir, DSTL and Jane Street. Our recent work has included building model evaluations suites – such as Replibench - the world’s most comprehensive evaluation suite for understanding the risk of a model autonomously replicating itself over the internet. We also regularly test the cyber and other relevant capabilities of frontier models, before they are released, to understand their risks.
As AI systems become more advanced, the potential for misuse of their cyber capabilities may pose a threat to the security of organisations and individuals. Cyber capabilities also form common bottlenecks in scenarios across other AI risk areas such as harmful outcomes from biological and chemical capabilities and from autonomous systems. One approach to better understanding these risks is by conducting robust empirical tests of AI systems so we can better understand how capable they currently are when it comes to performing cyber security tasks. In this role, you'll join a strongly collaborative team to help create new kinds of capability and safety evaluations to evaluate frontier AI systems as they are released.
About the Role
This is a cybersecurity engineer position focused on building environments and challenges to benchmark the cyber capabilities of AI systems. You'll design cyber ranges, CTF-style tasks, and evaluation infrastructure that allows us to rigorously measure how well frontier AI models perform on real-world cybersecurity tasks.
This work belongs inside UK government because understanding AI cyber capabilities is critical to national security, and robust empirical testing requires coordination across government, industry, and international partners to inform policy decisions on AI safety.
You'll work closely with research engineers, infrastructure engineers, and machine learning researchers across AISI. As a small, fast-moving team building first-of-its-kind evaluation infrastructure, you'll be able to influence research directions, own whole pieces of work, and bring your ideas to the table.
Core Responsibilities
- Evaluation Design & Development (60%)
- Design cyber ranges and CTF-style challenges for automatically grading AI system performance on cybersecurity tasks
- Build agentic scaffolding to evaluate frontier models, equipping them with tools such as network packet capture utilities, penetration testing frameworks, and reverse engineering/disassembly tools
- Design metrics and interpret results of cyber capability evaluations
- Infrastructure engineering (30%)
- Work alongside other engineers to ensure evaluation environments are robust and scalable
- Research & Communication (10%)
- Write reports, research papers and blog posts to share findings with stakeholders
- Keep up-to-date with related research taking place in other organisations
- Contribute to AISI's broader understanding of AI cyber risks
Example Projects
- Onboard and integrate new cyber ranges into our evaluation pipeline
- Conduct agent research to improve the cyber capabilities of our agents
- Improve grading and scoring methodologies for automated evaluation tasks
- Integrate defensive telemetry and simulated users into ranges to increase their realism
- Collaborate with government partners on joint research publications
Impact
Your work will directly shape the UK government's understanding of AI cyber capabilities, inform safety standards for frontier AI systems, and contribute to the global effort to develop rigorous evaluation methodologies. The evaluations you build will help determine how advanced AI systems are assessed before deployment
What we are looking for
We're flexible on the exact profile and expect successful candidates will meet many (but not necessarily all) of the criteria below.
Essential
- Strong Python skills with experience writing scripts for automation or security tooling
- Proven experience in at least one of the following areas of cybersecurity red-teaming:
- Penetration testing
- Cyber range design
- Competing in or designing CTFs
- Developing automated security testing tools
- Bug bounties, vulnerability research, or exploit discovery and patching
- Strong interest in helping improve the safety of AI systems
Preferred
- Familiarity with virtualisation technologies such as Proxmox VE and infrastructure-as-code approaches to enable reproducible test environments to be rapidly spun up for testing
- Ability to communicate the outcomes of cybersecurity research to a range of technical and non-technical audiences
- Familiarity with cybersecurity tools such as network packet capture utilities, penetration testing frameworks, and reverse engineering/disassembly tools
- Active in the cybersecurity community with a track record of keeping up to date with new research
- Previous experience building or measuring the impact of automation tools on cyber red-teaming workflows
Example backgrounds
- Penetration tester with 1+ years experience; has designed CTF challenges or cyber ranges; strong Python skills; interested in AI safety
- Content engineer at a cybersecurity training platform; experienced in building vulnerable machines, CTF challenges, and automated deployment infrastructure
- Security researcher with experience in vulnerability research or bug bounties; familiar with penetration testing frameworks and reverse engineering tools; has communicated findings to mixed audiences
Core requirements
- This is a full time role.