Back to NewsArtificial Intelligence

Anthropic Publishes Internal Data: Claude Now Writes 80% of Anthropic's Code, Path to Recursive Self-Improvement

6/11/20266 min read

Anthropic Publishes Internal Data: Claude Now Writes 80% of Anthropic's Code, Path to Recursive Self-Improvement

Anthropic published a rare document today. Not a model, not a product announcement, but a direct, named warning about its own trajectory. In a piece titled "When AI builds itself" authored by Jack Clark and Marina Favaro of the Anthropic Institute, the company laid out internal data showing that Claude is now meaningfully accelerating the development of Claude.

Their own framing: the data shows Claude is accelerating AI development — a possible path to recursive self-improvement, or AI autonomously building a more capable successor — and it is happening faster than they thought.


What the Data Shows

80% of Code Is Now Written by Claude

As of May 2026, more than 80% of the code merged into Anthropic's production codebase was authored by Claude. Before Claude Code launched in research preview in February 2025, that number was in the low single digits. Leadership has publicly estimated the figure is 90% or higher when scripts and experimental code are included.

Engineers Ship 8x More Code

For Anthropic's first four years, code merged per engineer stayed flat. It began climbing in 2025 when Claude started running code instead of just suggesting it, then steepened again in 2026 as models began working autonomously over longer horizons. The typical engineer now merges roughly 8 times as much code per day as in 2024.

Training Speedup: 3x to 52x in One Year

Every time Anthropic releases a model, it runs the same test: give Claude code that trains a small AI model and ask it to make that code run as fast as possible. A skilled human researcher needs four to eight hours to hit a 4x speedup.

DateModelSpeedup
May 2025Claude Opus 4~3x
April 2026Mythos Preview~52x
Human baselineSkilled researcher~4x (4–8 hours)

In one slice of research work, Claude went from helpful to superhuman in under a year.

Research Judgment: AI Beating Humans

Anthropic created a test using real research sessions where a human took a wrong turn. They showed Claude only the work up to that point and asked what to do next. A separate Claude that could see how things actually turned out judged whose call was better.

DateModelBeat Human Choice
November 2025Previous best51%
April 2026Mythos Preview64%

The caveat is that these were deliberately chosen moments where the human choice had room to improve, so it is not a clean head-to-head. But the direction is the point.

Vulnerability Discovery

Anthropic's partners, plus its own teams, have found more than 10,000 high or critical-severity vulnerabilities in essential software. The company's public coordinated disclosure dashboard, updated May 22, lists 1,596 specific disclosures across 281 open-source projects, with 97 patched so far.

One case in numbers: in April 2026, Claude shipped over 800 fixes that reduced a class of API errors by a factor of a thousand. The engineer overseeing it estimated a human would have needed four years.


The Limit That Remains

Anthropic is careful to state what today's data does not show: that Claude can choose what to build. Claude can take an underspecified engineering problem and figure out the method. It can match or beat skilled humans at running a well-defined experiment. What it cannot reliably do yet is exercise research taste — deciding which problems are worth working on, which results to trust, and when an approach is a dead end.

The company describes the current division of labor as: Claude is doing the building. Humans are still doing the directing.

The unsettling part is the line they add next: research taste might just be another capability AI fails at for a while and then learns, the way it eventually learned to explain why a joke is funny. They do not know if that gap closes. They are saying it might, sooner than most institutions are prepared for.


Why Anthropic Would Publish This

Two honest readings, both probably true at once.

Mission. Anthropic was founded on AI safety. Publishing internal evidence that AI is accelerating AI development, alongside a call for the ability to slow down or pause if needed, is consistent with that founding purpose. The piece ends with a proposal: that the world should build the capacity to verify whether frontier labs have actually slowed or paused, so that a coordinated slowdown would not just hand the lead to whoever cheats.

Positioning. "Our model is so capable it is starting to build its successor" is the most powerful enterprise sales and recruiting message a frontier lab could send at this moment. Both things can be true. The data is real, the safety concern is real, and it also happens to be a strong statement of capability at a strategically valuable moment.


What It Means

For engineers: The value moves from writing code to specifying and reviewing it. Anthropic states that once AI and human code quality reach parity, which they expect within a year, humans stop writing code and shift to reviewing it. The new bottleneck becomes how fast humans can review what the AI produces.

For teams and companies: A 100-person company is starting to be able to do the work of a 1,000-person one, because each person sits atop a stack of agents. The competitive question for the next two years is not whether you adopt AI — it is whether you reorganize around it faster than your competitors do.

For the frontier: If the labs that best automate their own research compound fastest, then the lead goes to whoever has the best models and the most compute to run them in a loop — exactly the position Anthropic, OpenAI, and Google are spending hundreds of billions to secure.