Anthropic's C compiler - implemented in Rust

Interesting post - Anthropic's Claude Opus 4.6 built a C compiler.

tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. Over nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V

The repo (rust) is here. 100k LOC.

Anyone tried it or looked at the code?

I've read about it. It's definitely not a production-ready compiler (its optimized code is apparently slower than GCC's output without any optimizations), if done by somebody outside Anthropic (or its competitors) it'll cost about $20,000 to get to this result (expensive for what amounts to a POC), the agents were running up against their context limits constantly, and at some point fixing bugs started breaking other parts of the code.

The developer also had to hand-hold the entire process by writing test harnasses, and even using gcc to get past a point where all the agents were looping, which makes the entire thing a lot more suited to solving well-defined problems rather than "Generate me an X" and being able to call it a day.

So, plenty of things that would need improving if the goal is to have a commercially viable product or service, but it's a clear demonstration that those agents are actually improving relative to even a year ago.

3 Likes

People have a fever now to rewrite everything in Rust. Unfortunately, Rust solves only a specific niche of problems leaving other behind. I also rewrite my build scripting tool in Rust, but it is still behind of the original tool. What I see good in the idea, that true AI won't use any programming languages and generate the result code in machine instructions. So this Anthropic’s work is really valuable, because they can be the first company developing a real AI.

Interesting idea!

There are a lot of disadvantages though. First is platform differences forcing AI to learn 100+(?) variants of assembly. Second is that AI is already trained to write in programming languages, and training it to write machine code requires other significant effort. Third is, personal idea, that machine code is often just less good at expressing clear control flow logic etc., though if you put a lot of money into it then AI might get through it.

Did I misunderstand anything? The C compiler is written in Rust, not machine code.

Correct, however you still thinks of human categories. 100+ different machine codes is much simple than 100 languages x 100 different implementations and thousand bugs not easy to trace produced by humans. A machine will generate one way code and deal only with own problems. Say more, machines will reduce a number of processor variations with time.

It's a clear bootstrapping problem. When I compiled my first Rust program I used Java scripting language untill I could implement it using Rust. Absolutely same thing will happen with AI. First versions will be implemented by humans using Rust. Second generation of AI will directly generate machine code and no more Rust. Rust is a human thing for humans making mistakes. Machines do not do mistakes and they do not need Rust.

LLMs "make mistakes" all the time, often similar to humans. Rust is going to continue to be valuable because it provides useful static guarantees, encourages good documentation, etc.

6 Likes

it's a clear demonstration that those agents are actually improving relative to even a year ago.

Are you sure there? It's well-known fact that LLMs work well for something that was already done once and don't work well for something outside of their training set.

IMO all these things that we are getting now are less showcases of “amazing agents abilities” and more of “brown paper bag” developments of last 20 or 30 years.

Just why do we need to write so much code to create something like “to do” program or a compiler? Why couldn't we simply use high-level tools? Why have we switched from visual development that can be easily done by people without software developer degree to coding tools that now need LLMs to create something usable?

I'm not sure what would happen with LLMs in the near future, but long term it looks as we are facing the wall: instead of making things simpler and easier to understand we are making them ever more complex way past the point where ROI have become negative.

It's as valuable as The Mothers of All Demos which have shown us, in 1968, how things would be done in year 1998.

Similarly here: we wouldn't be getting “real AI” for the next 10 or 30 years, that's for sure. When someone who have to be optimistic (because it's literally their jobs) says AGI is stil 5-10 years away… I tend to multiply that by π (not sure why π and not something else — that heuristic worked so well for so many years that I stopped questioning).

Is it some kind of joke? Have you actually even read the article? Not only machines do mistakes more often than humans, but they have no idea how to fix them (that's why “all the agents were looping” and human had to unlock them). These 5-10 years (Demis Hassabis estimate) which would become 15-30 years in reality (π adjustment) are just to make these mistakes less dangerous, to bring them to human scale. AI that “doesn't do mistakes” is unlikely to arrive in XXI century (if it would even arrive ever).

2 Likes

To me this experiment seems quite flawed. The purpose is to test having multiple AIs cooperate to build a large project.

But they chose to create a compiler. Which exists multiple times in its training data. Also theory on how compilers are architectured is in the training data. For C, a language that is in the training data. Looking a bit at the code we see (at least the file names) standard parts of "normal" compilers.

Conceptual when an agent hears "lexer for a c compiler in rust" it doesn't need any more context to create the actual implementation apart from mapping the output types to the input of the next step.

One of the first things i saw was an enum mapped to a similar enum.
So these AIs regurgitated small parts of compilers and then glued them together. But this only works because the agents don't need any context on what they are implementing. Every agent in that project had implicit understanding of the entire architecture, because it is in the training data.

A fun question is: did they choose rust for safety or could they just not use c++, because then we'd see tons of one to one copies from GCC in the code.

7 Likes

I just had a quick look at the code. I confess I am not entirely sure what happened here, I have some suspicion that Claude actually translated an existing compiler (or compilers) into Rust, but I guess using Rust did help with getting this "thing" to work!

In a way, I think it shows how powerful Rust is, rather than how powerful AI is.... if it compiles, it works... that is what everyone says! Even a stupid AI can write a Rust program!

2 Likes

Well let's put it in reverse: a year ago even this result wouldn't have been possible at all. So yeah, that's progress.

As for an answer to questions like that, I guess the ultimate goal (for which AGI is definitely needed) is to just be able to farm out arbitrarily difficult work to AIs.
Look at it this way: it's not that we can't use high-level tools (we can!), it's that AI on general is aiming at becoming the ultimate high-level tool.
We need to write all that stuff around it exactly because the tools are still flawed. But that doesn't mean they can't be useful at all, rather it means we need to be judicious in what we use it for.

From a certain POV I view this experiment as a way to gather more data at some significant scale (in terms of size of the code base) on where the boundary lies between what can be farmed out to an AI, and what can't.

Worse: they would have very quickly arrived at the point where their creations would have started “looping” because C++ needs deep understanding of what you are doing because of the need to avoid hundreds of tricky UBs.

With Python or Java they would have had a chance, but Rust makes it easier to detect errors, one doesn't need tests, just need to run a compiler.

That's not entirely true, tests played a large role, too. But you need predictable language for that — and that means C and C++ are not suitable.

We don't really know. We know that year ago no one had resources to do that on top of ChatGPT-4.5 that was the last model that was able to show significant improvements with just scaling: it was too expensive for similar experiment.

What we have got in the year that followed are models that are cheaper than that, but I'm not sure we have got any significant improvements in capabilities, it's mostly “cheats” of different forms, after “scale is all we need” adage stopped working.

But that would just simply not work with current approach: these models are atrociously bad with doing things that are not described in their training set! They are, essentially, doing what, decades ago, was called “stack-overflow development model”: stick together random pieces of code that you may find on sites like stack overflow and hope that the end result would work, somehow.

That's only helpful if we insist on creating impenetrable piles of shit in the name of satisfying “design ideas” of various people while simultaneously inventing nothing actually new, in the process.

Note how we only need these piles of code that are combining various pieces written independently (these conversion of different enums that people have discovered in these sources and things like that) simply because instead of trying to work together (like it was envisioned in the 1970th or 1980th) people started reinventing the wheel 100500 times.

Can you show me the tool that allow me to do simple CRUD app for Android or iOS in the same way I can do that in Visual Basic 35 years ago? I'm not sure how many there are. And even if you find them and use them — they would immediately laughed out because they create something that is not complying with ideas of Material 42 or Liquid Adamantium that Apple introduced two months ago.

Not really. We need to write a rewrites of these exact same things that are already written many times again and again because otherwise we would have nothing to sell.

This remains to be seen. For last ~20-30 years we have been spending the majority of resources doing things that are useless to satisfy requirements that are pointless. There was some actual progress, too (Rust language or new codecs like AV1 have some genuine advantages compared to what we had before), but it's well-known fact that MS Office 2007 was radically redesigned because they simply couldn't find anything new to add to office 2003 so they started rearranging things in the hope of making them easier to use. And the same thing was happening in all other software development companies.

IOW: so far we know that AI is helping with work that exist solely because humans invent it and which doesn't solve any real problems that already written code couldn't solve.

Whether AI may ever be helpful for anything else (and when would it be helpful for anything else) is an open question.

And the busywork that doesn't really achieve anything except “ooh… it's new… shiny” vibe for marketing department can only exist in a world which does have a lot of resources to simply burn away without getting anything in return. We are quickly arriving at the end of that world.

If they wanted that then they would have picked some entirely artificial task, yet one that's not explored and not described on the web… but we all know that LLM would fail pretty much disastrously there, that's why they haven't tried. Plus that's not really needed in a world where the only thing that happens is rewrite of the already written solutions in a new fashion. The experiment is actually important and impressive for the world which is busy writing code for tasks that are already solved to produce things that already exist… but how long would that world even exist?

6 Likes

Last year Anthropic, OpenAI and Google all had the resources to do this. This year it was an Anthropic engineer who did this, with the backing of Anthropic (presumably they let him run those agents without the engineer himself having to pay for it out of pocket).
We also know that plenty of people tried to build larger projects and got outcomes significantly worse than this. So by any measure this is progress.

I never claimed it would. You asked a bunch of questions, I simply attempted to answer them. So please don't move your measuring stick while the bus is driving, so to speak.

You mean like most human developers out there? Building a website, or a platform of some sort, has long since lost its novelty, but that doesn't mean it doesn't deliver value. Otherwise those people wouldn't be doing that.
Novelty isn't the goal. Creating value is.

Well if you want a simple CRUD app, coding agents will more than suffice for that. By your own reasoning, there's a lot of that in the training set of the LLMs backing those agents.

I was referring to things like test harnasses, helping those coding agents along if/when they get stuck etc.

Perhaps that's true for you. I have already seen that they can deliver value, as well as the fact that they have limitations.

I'd say that for the people for whom this is true, they'd probably be well off being a bit more discerning on what they spend their time on. But it certainly isn't true for me (I can only speak for myself here).

It feels like your problem here is more with the economic model behind all of that activity then. Because that's ultimately what value creation is: something that the purchaser is happy paying real money for. The rest follows from there, since where there is demand there will be supply, and the fact that that doesn't seem to match your novelty criteria is kind of irrelevant there.

It's also something that as you yourself say is something that humans have been doing since way before LLMs came along, so you can't really blame that tech for that.

There are probably undiscovered use cases of LLM-backed AI, yeah. But that doesn't mean it doesn't have any today.

For example, with tools like v0 you can do quite rapid prototyping of UI mockups. Whether or not that delivers workable code isn't really the point, rather the point is to get to a MVP much more quickly than before.

I'm sure that'll come along at some point. But it's an illogical place to start such experiments - instead it's a much better approach to first do the basics, and only then branch out. Building a known quantity in the form of a compiler makes a lot of sense as an early step in full-program synthesis.

What I know is that earlier approaches in using the tech would have failed. But the entire field is developing quite rapidly, and people are learning, and figuring out quickly how to get more out of this tech.
Will it ever be perfect without fully-fledged AGI? Probably not. But that doesn't mean it's completely useless in the meantime. Like I said, it's a matter of judicious use of the technology (as contrasted to a shotgun approach).

Once again, it sounds like you have beef with the predominant economic model (i.e capitalism) more than anything else. People are doing it because it delivers value for someone, who is quite clearly willing to pay for it.
Novelty was always besides the point when it comes to delivering value.

2 Likes

Why a joke? AI can't make mistakes by a definition. You use the Rust compiler a lot, right? When you see something like that

error[E0382]: borrow of moved value: `s`
  --> /media/exhdd/Dev/modu/color/color.rs:11:5
   |
 8 |     let mut s: String = String::from("alpha");
   |         ----- move occurs because `s` has type `String`, which does not implement the `Copy` trait
 9 |     println!("{}", s.clone().bright().yellow().on().black());
10 |     consume(s);
   |             - value moved here
11 |     s.replace_range(1..3, "==beta==");
   |     ^ value borrowed here after move

You do not tell - it's again the compiler mistake, I do everything right! Instead, you just follow the compiler recommendation

note: consider changing this parameter type in function `consume` to borrow instead if owning the value isn't necessary
  --> /media/exhdd/Dev/modu/color/color.rs:4:15
   |
 4 | fn consume(s: String) {
   |    -------    ^^^^^^ this parameter takes ownership of the value
   |    |
   |    in this function
help: consider cloning the value if the performance cost is acceptable
   |
10 |     consume(s.clone());
   |              ++++++++

You can see the compiler bug very rarely, and again, it was done by a human who programmed the compiler.

Why a machine can't make mistakes? Because it's a determined mechanism which just precisely follows instructions. Why mistakes are human's thing? Because you can be destructed by different sources and not follow precisely instructions. You finally affect by quantum behavior. Do you think differently?

Interesting! What it means is that LLM is now pretty capable of translating programming languages.

Without the source code of existing C compilers throughout human history it stole, an LLM won't even know what a C compiler is. How could it possibly write code by a simple prompt? through the specification alone?

I don't have luck with LLM generated code with C++ templates and multi-thread synchronization.

People are doing things that they are paid to do, you know. Whether that thing that they are doing delivers value or not is not for them to decide.

But what that “value” even is? What do people get when they replace Vue.js with React (or the other way around)? Do we even know if all these endless rewrites bring money for anyone except for managers who propose them?

Modern programming industry is like a fashion industry in a sense: it tries to make people buy something they don't need because using something “old-fashioning” is considered passe… but can we even afford that much work being spend on that?

And if AI can do nothing except do things that no one really needs or wants (but only on things that agressive marketing convinces people they need) then where would trillions dollars of real wealth come from that are needed to sustain all that activity?

Sure… if someone else funds their development — but who said “someone else” is supposed to be in a world where “free money” wouldn't be available to burn on that task?

And, worse: why do we replace something that people can sustainably do with something that can only be done if someone else pays for all that activity while incurring only losses?

Limitations we all can see. Value… not so much. Anthropic tells us this work was $20000 — but have it counted for the fact Anthropic sells its models for about 10% of cost (and thus massively loses money… the same as all other LLM-related companies)? Would there be any value delivered after the megabubble of LLMs would pop? We don't know that, yet.

We only know that AI companies are trying to invent way to break-even… and losing that race, so far. With LLMs being, most likely the dead end (ironically enough other forms of AI that actiually deliver value may be thrown out, too, when bubble would pop).

Partially. But also with sustainability of the whole thing. If LLMs are only good for things that have zero intrinsic value and thus would evaporate once “free money” craze would finally run out… would there be anything left, after that?

Bets that are made on LLMs are so big that it needs to deliver AGI or something close to it, but if AGI is, as we've been told, 5-10 years away (or, even worse, if it's 15-30 years away) then we would face another AI Winter instead of anything else.

Sure – if someone else would pay. But time is running out.

I'm not sure economic model where people exchange valuable resources for the vague promises of repayment in the future may be called “capitalist”, but that's not even my point. The point is that such ability is coming to the end. Open loop “create MVP → impress investors → throw the code away because it was never useful in the first place” that LLMs are designed for may only work in a world where investors are not people who own something or want something but people who are close to the source of “freshly printed money”… but with the world increasing reluctance to accept money that are not backed by anything at all… this model would end.

And if LLMs could offer nothing else then the whole industry built around them would collapse, too.

You have very strange definition of AI, then.

Yes. Because humans make mistakes yet they can also fix them. LLMs are doing mistakes just as often yet they are currently much worse at fixing them.

That's not how LLM works, sorry.

That's hardly surprising if you recall that LLMs were invented to translate human languages. Programming languages are languages, too, of course LLMs would be able to translate them.

Unfortunately LLMs are poor translators and while with human languages they rely on the ability of reader to “fix” broken translation in their head while computers couldn't do that. Some crutches helped LLMs to, fianally, produce something working… that's kinda-sorta impressive if you forget that normally “it's now written in Rust” is never a real goal.

If you took C or C++ program, translated it to Rust and got something that's more buggy and more error-prone… then was that quest even valuable?

Exactly! No need to be over-amazed.

I read the source code. And when I see folders and files like "lexar", "tokens", "parser", etc., I keep taking hats off to old acquaintances.

Note that those files were automatically generated by ANTLR. The whole thing is just funny.

1 Like

You do not need to be sorry. Probably you know something new and keep it a secret. I filed a patent in the LLM area and currently works on several algorithms around LLM. Everything is a very deterministic. Sure, you may hold some knowledge beyond of that, but a probability doesn't make algorithm not deterministic.

I assume that entire OS, like Linux is written on some sub-set of C. It is how human brain works, you do not use all Rust features to solve your task and prefer to use some limited sub-set. If you created some limited C compiler for your AI, it's very beneficial. And you can make sure that your AI do not use more C features than exist in the compiler. So you are guaranteed from 'human mistakes'.

Rust compiler also is very deterministic, and yet it contains bugs. The fact that ChatGPT is deterministic doesn't help it to avoid bugs when it's output depends on petabytes of garbage consumed from the web.

Not if you feed it the internet to produce output in the “training process”. To guarantee that there would be no “human” mistakes you have to, somehow, teach LLM to produce something useful using only small amount of data that would be free of bugs.

That's not even close to how LLMs work.

4 Likes

Used to be that LLMs were better at generating C code than rust and other languages with less examples - but I guess they moved beyond that point.

I think they actually rely on gcc's libc - because there isn't one in the repo and the standard header files (stdio.h,...) are missing.

The project compiles very fast - there is a ccc-arm for MacOs, but without libc there is not much you can do. Wanted to see how fast the compiled code was compared to clang.