Come and work with me on Anthropic's Frontier Red Team

03 Feb 2025

I work at Anthropic on the Frontier Red Team. Our mission is to find out whether AI models possess critical, advanced capabilities, and to help the world to prepare. We’re hiring AI researchers and engineers in the US and UK, and if that describes you then we’d love to talk.

We explore questions like: can models design bioweapons, or accelerate vaccine research? Can they orchestrate massive cyberattacks, or defend our critical infrastructure? Can they self-improve, or build a business, or fly drones? Even if they can’t do these things right now, when might they be able to?

We’re a technical research team embedded inside both Anthropic’s policy and research organizations. We work with frontier models; top experts in every domain that we cover; and key national security and policy actors. This Wall Street Journal article explains a bit more.

Below are the people we need to help us go faster. If any of them sound like you then please get in touch:

People to build and run evaluations

We need people to design and run the evaluations for Anthropic’s Responsible Scaling Policy (RSP). These people:

Work with internal domain leads and external domain experts to design and build cutting edge evaluations across cybersecurity, biosecurity, AI R and D, and more
Build sandbox environments to safely run these evaluations
Build automated analysis frameworks to make sense of all the results
Build agentic frameworks to squeeze the best possible performance from models

I think this is a great job for anyone, and the best job in the world for generalist engineers who are also interested in AI. You get to spend half your time building bulletproof systems for novel technical problems, and the other half thinking earnestly about how many tokens it might take for an AI model to build a GPU cluster to copy itself onto. If you work in computer infrastructure anywhere else then I’m convinced that you would have more fun working here instead.

People to conduct deep research into critical domains

We believe that the most important capabilities that models are likely to develop are in autonomy, biosecurity, and cybersecurity. We have small sub-teams that lead research into each of these domains. People on these teams:

Carry out threat modeling. What would it even mean for a model to be dangerous in this domain? How could it cause catastrophic harm? What capabilities would it need?
Design evaluations that test for these capabilities
Run large-scale experiments on these evaluations; understand what capabilities current models have and don’t have; predict when these capabilities might emerge
Build safe demonstrations of dangerous capabilities to show to experts and stakeholders

The people on these teams get to work on projects from reinforcement learning to understanding the biggest threats and opportunities facing society. Never a dull moment.

Logistics

This is the best and weirdest job I’ve ever had. The team is full of brilliant people who care a lot about their jobs and each other. Most of the team is on the West Coast of the US, with some on the East Coast and a couple in London. I work from London and my schedule is great, even with two kids. I go into the London office once or twice a week and to SF a couple of times a year.

Apply now

If you’d like to work with us then apply using the links above and below. There’s no specific deadline and we’ll keep looking until we find the right people, but we have a lot of work to do and need people to help us do it as soon as possible.

Those links again:

RSP Evaluations - Research Engineer, San Francisco / Seattle ; Research Engineer, London
Autonomy - Research Scientist, US Remote-Friendly; Research Scientist, London
Biosecurity - Research Engineer, San Francisco; Research Scientist, San Francisco
Cyber - Research Scientist, San Francisco / Seattle

Robert Heaton