Using AI to generate Rust code

Hello,

I'm looking for advice and to hear about experiences about generating Rust code using AI.

I'm part of a team that has become very enthusiastic about generating code using AI. We are a research group that includes computational researchers and software engineers. Projects range from smaller pieces of Python or R code to implement novel analysis methods for research to larger pieces of infrastructure in Java, Scala or (if I'm responsible for the choice) Rust, or JavaScript, mostly Vue, for the WebUI.

Last week, ChatGPT published o3, and our PI (i.e. principal investigator, i.e. our boss), entirely by prompting o3, was able to take a Python function that implemented a complex mathematical algorithm we invented, turn it into a description of the algorithm, turn that into a more advanced version of the algorithm, and turn it back into a Python function.

So now, obviously, he believes we should generate most (if not all) of our code using AI, instead of writing it ourselves.

So I'm wondering how successful people have been in generating Rust code by AI?

My guess would be that it is not feasible to generate a project that includes dozens of functions or types with a single prompt. Or is it?

So the main problem seems to be how to use AI to modify an existing project. What are the possible ways to do that?

I've been using JetBrain's RustRover with the GitHub Copilot plugin since a few months, mostly for its autocomplete suggestions, which range from rest of the line to entire functions. It's been useful, but mixed. Sometimes it suggests exactly what I want. Sometimes it suggests code that works, but is sub-optimal. Sometimes it suggests code that looks right at first glance but turns out to be incorrect. Sometimes the suggestions are silly, like suggesting a function that already exists. Surprisingly often, the suggested code actually compiles, and if it doesn't, it is often because it suggested using a function that doesn't exist but could be plausibly expected to exist.

GitHub Copilot also allows prompting, but I haven't used that much, mostly because I'm not sure how that would integrate with the rest of the code.

RustRover now also comes with its own AI. This one allows prompting for adding code at a specific location. I have not yet have much chance to try it out. My very very little experience with it makes me feel it is comparable to GitHub Copilot, but I need to use it more to tell.

So what do other people do? Thanks!

1 Like

In my experience, the current LLMs can generate decent small programs (100-1000 lines), but get lost and start running in circles in larger programs. This is true for all programming languages.

There's also a mundane problem of LLMs having a bad UI for code editing — updating the program by re-generating the entire code is slow and may introduce random unrelated changes. Making changes via diffs gives lower-quality results and sometimes diffs don't apply cleanly. And you need ability to edit and refactor, because it's extremely unlikely that you will get a large program correct on the first try.

Context window size is another problem. If you can't fit your whole codebase in the context, any code changed/added may reinvent the existing bits or not fit program's architecture and requirements. Cutting-edge models are getting context windows large enough to be useful, but currently taking advantage of that is expensive. Models runnable locally can barely fit toy programs in the context.

Rust-specific problems:

  • hallucinated dependencies, models trained only on old versions of libraries.
  • The quality is inconsistent – sometimes you get noob code with .clone() and .unwrap(), and need to instruct the LLM to fix that. You also need to guide the LLM to do large scale planning and modelling of data first, otherwise it will improvise as it goes along and paint itself in a corner with poorly designed structs, nearly duplicate enums, etc.
  • There's no tooling for abbreviating code to fit the limited context window, e.g. you'd want struct definitions and APIs, shortened doc comments, but not waste tokens on function bodies.
21 Likes

TL;DR: AI is good as "an assistant to a coder", not "replacing a coder".

Now, on to the wall of text!


My client has been using Claude Code to write a program in Rust. In their workflow, the AI writes the code with human feedback. Unfortunately, their experience sounds very frustrating. However, progress does get made and the program basically works.

In my personal experience, I had a number of problems when I tried having ChatGPT write a benchmark in Python for varying solutions to the Nth Fibonacci sequence (which it also wrote). A lot of the problems were like getting it to use some arbitrary precision libraries correctly.

It would hallucinate method names and arguments. Feeding the exception messages back in to get it to rewrite the code only worked about half of the time. I often had to consult library documentation and make corrections by explicitly telling it what the method was actually called. And it would invariably change unrelated code for no reason (echoing kornel's experience with diffs and merge conflicts).

These are the kinds of problems you really don't want. I could have written this dumb benchmark much faster without the AI...


On the other hand, not writing code with AI is quite productive. I'm able to have ChatGPT troubleshoot and debug my code (especially if it is familiar with the algorithms and library interfaces) rather than actually writing any of the code itself. It can instead analyze what exists and make suggestions on how to improve it. It can also provide ideas for optimizations, and recommendations for things you might not even be aware of.

I was successful in training my own neural network from scratch while knowing practically nothing about neural networks. ChatGPT provided me with some options for which architecture to choose, and it generated some Rust code for inference using only ndarray. I haven't used the code yet, but it was simple enough to seem practical, so I went down the rabbit hole of learning how to train and optimize a model with PyTorch.

There were numerous bugs along the way, and ChatGPT was able to identify the root cause of all of them. I made the fixes to the code myself. The LLM provided only code snippets occasionally when relevant. This kind of workflow was wildly successful, in my opinion. I understand all of the code (with exception of most of the PyTorch API) so I can easily make changes and refactor it to better fit my needs. ChatGPT mostly acts as a reviewer and rubber duck through the whole process. I can bounce ideas off of it, ask for advice on what to do next or how to solve new problems as they arise.

One thing is for sure, GPT-4o knows a lot about machine learning. It is able to teach a noob like me to invent their own models. And FWIW, my model actually works pretty well! I spent three days on this project, lots of trial and error. And just when it was looking like it would be a colossal failure, my model started working! So, at least I have renewed confidence and I can say that coding with an AI assistant can be pretty awesome.

It was definitely not "let the AI do everything", though. I think that's still an unrealistic expectation. At least right now.

For small scripts like my model training script (less than 1,000 lines), the whole thing can be uploaded and analyzed in one go. For large projects, you need a way to distill the important "knowledge details" embedded in the code to fit within the model's context window.

Larger context windows will be more important, but I don't know the capabilities of these models, myself. Llama 4's 10 million token context window is intriguing. Can it really keep the thread of logic throughout that entire window? (Honest question, if anyone has experience with it.)

As I understand it, this is the big problem with having AI work on large code bases and it's a very active area of research.

I haven't tried any AI-enabled code editor, yet. I don't know how well they work or what problems you're likely to hit. But your experience reflects what I have heard from my client with Claude Code.

9 Likes

I did a few tests, though it was 6-8 months ago, with Copilot, like you but in VSCode, and RustRover's AI assistant. I also did a few tests with ChatGPT 4.

At the time, JetBrain's AI was terrible. The feature that seemed the most interesting was querying the AI to find methods and traits in the standard lib, or general algorithms to complete a task, so in "chat" mode. Writing the draft of code documentation was OK, too. But writing code was just not worth the trouble. I haven't tried recently.

Copilot was less bad (I won't use "better" because of its positive conotation), but the generated code needed a fair amount of editing and refactoring before being sometimes acceptable. It wan't well integrated with the rest of the little project I tested it on. Querying the AI about the project was worrying: most of the time, the AI couldn't even give any information in items that were in other files (that part must have been a problem in VSCode's integration). Otherwise, asking general questions and writing doc drafts were OK.

Those tools are based on LLMs, as said above. LLMs are not thinking machines; they're only giving the best guess to complete a series of symbols, based on their training (literally). It's a direct "reflex" response from a neural network without iteration or verification, except if we count the iteration through the output it progressively generates, which is used as a sort of context. So it's no surprise that the results have the properties we know: inconsistencies, hallucinations, inability to reason iteratively, shortsightedness, poor coherency with the code in the context of the project, etc.

I'd have to retrieve the links, but I read a couple of studies of the active use of Copilot in projects for several months. The conclusion was that the AI seemed to help somewhat, but in the mid-/long-term, it increased significantly the need of refactorization in the code base in comparison to earlier months where no AI was used. There were also evidences that, if the requirements were close enough to something the AI learned but yet different in some ways, the AI would stick to its learning and ignore the differences entirely.

For me, it's a definite no. Don't use it. That sort of AI isn't meant to be used that way, and the time you may feel you gain will be at best matched by the technical debt you happily add to the project. I think it's irresponsible to use those tools, not even mentioning the dependency it may create on the developers, or the risk the management could be tempted to hire lower profiles to only feed and monitor the AI. Or the time and power it consumes. Or the long-term feedback loop issues (when AI models will train on data mostly generated by their earlier generations).

9 Likes

Currently AI is very dependent on 3rd party code. Unfortunately it's crates.io, and code quality there is awful, because most solutions are a direct conversion in Rust from C code. If you can live with that, you are lucky. I can't and need to rewrite code in Rust way and do not use crates.io for the reason. However AI is quite well in a generation small solutions in Rust. It give also at least 3 choices, so I currently can't imagine Rust coding without AI. I think you can't do a quality vibe programming in Rust yet, but the situation will change soon in the good direction. Stay tuned.

1 Like

In a recent talk, I heard Eric Schmidt claim that 10-20% of OpenAI's research code is AI generated. Don't know where he got that number from, but he seems well informed.

I ask myself the OP's question every few months or so. So far, what I have read is similar to what other poster have shared here. Thus, I have avoided to use LLMs to write code (I use them to draft simple letters). There is the occasional post (e.g. in reddit) that claims that LLM are ready to do the coding for you.

Last week, I update VSCode and, to my surprise, Copilot was enabled by default. I decided to give it a try. The autocomplete suggestions were horrible, perhaps because I was working on a no_std code that requires a rather niche crate. Or maybe I have gotten used to rust-analyzer's autocomplete (which is amazing and correct every time). Or maybe I am using Copilot "wrong". In any case, I deactivated it after several minutes. Just my 2 cents.

5 Likes

I never use tools like Copilot or autocomplete.

But I used AI to start with EGui and Bevy, and it was very helpful. It created a first draft, and helped fixing errors. But the fact is, that I was not really interested to learn EGui and Beny that time. For GUI, I am still waiting for Xilem, and for Bevy -- well I am not that much interested in Game development. That time I just wanted to get a simple GUI for my chess engine fast.

For grammar and spelling corrections AI is great and a big help for non native speakers like me. The strange think is, that the new O3 model from OpenAI now just deletes most of my writings, instead of fixing errors or improving the text. I have no real idea why.

3 Likes

Since a couple of months we use GitHub copilot at work.
I never actively let it generate code via prompt, because I never ran into a scenario where I did not know how to implement an algorithm or feature.
I mostly use it as a better code completion, which most of the time works pretty well.
Unfortunately it sometimes hallucinates and wants to auto-complete bogus code.

4 Likes

Yes, that's one topic I occasionally like to use AI for. They're language models, so it's normal, but they're surprisingly good at explaining grammar, analyzing sentences, translating, finding locutions and collocations, etc. It's overkill, of course, and not as effective when it's applied to source code since the logic behind is quite different.

1 Like

I have not used AI seriously fro anything yet but over the last year have toyed with the free AI available in the Warp terminal and Phind https://www.phind.com/search/cm9h7srww00003b67pvy1x5cm

So far:

Sometimes for smallish problems, a few hundred lines of code, the AI comes up with good looking solutions. In one case translating a C module to Rust including a couple of hairy C macros.

Often those good looking solutions fail to compile, they have silly little mistakes that are easily fixed manually.

Sometimes the solutions are really broken. Prompting the AI to fix the compile errors then just makes things worse. The more I prompt for fixes the worse it gets. Leading to the thing using non-existent API's and so on.

The good news is that the Rust compiler is so fussy about types, lifetimes and so on that if the AI generated code compiles one can have a lot more confidence in it.

Just now, just for fun, I asked Warp for a doubly linked list in Rust that adhered to the rules of Crust: Introducing Crust. Like C/C++ but C/Rust and provided the typical methods on a linked list. The generated code looks really good, it compiled immediately and passed all the AI generated tests. Only 200 lines of code. Who said liked lists in Rust were hard? Rust Playground

2 Likes

LLMs can indeed be used for writing simple programs and legacy code. However, the code generated by LLMs is often outdated and unable to utilize new libraries. Additionally, LLMs may misjudge such code as being unable to compile.

use std::any::Any;

struct Node<T : ?Sized> {     
    next : Option<Box<Node<dyn Any>>>,
    data : T
}

fn main() {
    let x : Node<u32> = Node { next : None, data : 7 };
    let xbox : Box<Node<dyn Any>> = Box::new(x);
    let _y : Node<String> = Node { next : Some(xbox), data : "aaa".into() };
}
1 Like

It missed the Drop impl, or rather the Crust equivalent I suppose, so MIRI reports the nodes are leaked, but that's still pretty decent results. Really though, its only really intrusive doubly linked lists with a safe API that are actually hard in Rust, the meme is more about how there's a lot of bad ways to do it if you are trying to do things "the Rust way" - doing the C translation works fine.

Yes. I've used AI quite a bit for Rust and other languages and as an IDE assistant. It can be useful for generating small segments and explaining problems. It can still be helpful even when it gets things slightly wrong. I also tried getting one to translate some C# to Rust and it was surprisingly good and accurate.

Yeah, I notice that. The core::ops::Drop trait requires a reference to self so is not really a Crust thing unless we relax the rules a bit.

I fixed the Miri complaint by ensuring all items are popped off the list at the end of the tests.

Thanks for all the responses, this is very useful.

I've been wondering whether any one has worked on a tool that does something like the following:

  • Prompt a user for a description of desired changes to a project
  • Send such prompt to the AI, together with existing project files and the request to create a change set
  • Upon obtaining the change set, verify that:
  1. The change set is valid
  2. It passes the compiler
  3. It passes the linter
  4. All tests still pass
  5. It passes optional user-provided checks (for example, not to change unrelated parts of the code)
  • If any of the above checks fail, complain to the AI and ask for a new change set
  • Repeat

Above seems to be comparatively straight-forward and very useful, at least once AI tools become good and cheap enough, which some people hope they will soon.

Is any one aware of such efforts? Thanks!

I have been using LLMs in my Rust (and other languages) coding since the early days of Gpt-3

I do understand the desire to place them at the centre of tools, me too.

But I think, now, it is the Wrong Thing.

These days I have become a firm adherent to the "enthusiastic bright young intern" model (from the early days) I ask the LLM to do things for me and I interpret the answers. For this the current web based UIs are best suited. I work up my question (often in a text editor - the current interfaces are not perfect) and copy the results.

The problem with placing a LLM right at the centre is you would spend as much time checking what it did (please do that) as anything else.

Also LLMs are quite non deterministic. It is hard to build tools around non deterministic APIs.

I do not think LLMs will change what we do, or how we do it. We will be doing the exact same things, build the exact same artifacts, but better and faster because we have a better tool

LLMs will not be doing it for us, any more than a nailgun can build a house

5 Likes

I'm sure that 10-20% of OpenAI code (like all code) is also semi-trivial boilerplate and scaffolding that a trained monkey could write just as well.

1 Like

Of course not! AI doesn't, currently, have any logic thus there are nothing to to keep.

If we would represent AI research as “an attempt to create a creature in the image and likeness of human”… then we are building it “from outside in”.

If you have read Isaac Asimov's work (like I, Robot, e.g.) then you would know how people imagine creation of such a thing would happen: first robots would learn to hear humans, then they would get conscience and would be able to act, then, finally, learn to talk.

In reality… everything happene in the almost inverted order. Speech was conquered on 8bit devices. After 40 years they have learned to create intelligently sounding sequences of words. After maybe another 20 or 40 years they wuld learn to “keep the thread of logic”.

You can read very fascinating article that explains what is happening, these days: AI models have managed to collect crazy impressive amount of tricks that circus that human mental calculators employ. They weren't entirely useless, just like AI models are not entirely useless… but none of them ever discovered math theorems, never wrote any scientific aritcles… that's something that wouldn't lead to the ability to “keep the thread of logic” any time soon.

More of “area of very active wishful thinking”. We have no idea how long would it take for AI models to learn even a primitive mental models, but they sure as hell learned many tricks to fake and cover up their inability to think… just like many humans do. And they can “remember” so many things they consumed during their creating… no human may beat that – but then Google Search couldn't be beaten by human, too, yet no one expected it to write code for you for many years.

Depends on your goals. If you are explicitly making your code a part of “pump and dump” scheme of some sort then it could be pretty lucrative. The trick is, of course, to cash the gains and bail out of such company before the whole house of cards would collapse.

So… maybe in 20 years?

There are lots of efforts yet not that many useful results. Please read what others have wrote: existing AI is entirely useless for what you want. It generates decent code (with hallucintions and mistakes, but core is good), then you start pushing it with “it have to pass the compile”, “it have pass the linter’, “tests should still pass” and you may watch, in real-time, how code that started as half-decent one is slowly turning into a pile of goo… the loop that you describe would never produce anything good because as you continue to push on AI it starts producing worse and worse code… how can you expect it to produce good result if intermediate steps are all go from bad to worse?

4 Likes

I regularly miss one aspect that I consider very important:

Even if you don’t want to call it a revolution, one should acknowledge that the development over just about 7 years has been breathtaking and is still completely mind-blowing today, if not even crazier than in the past.

  • Attention Is All You Need was published about 7 years ago.
  • I’ve been using an AI assistant in my editor for about 4 years now. Initially, it was completely useless for Rust, but it has become really impressive today.
  • ChatGPT has been publicly available for just 2.5 years and has improved enormously.
  • Today, you can download models onto your local PC and run them locally, offering performance relative to their size that was completely unthinkable just 4 years ago.

It’s unlikely that this development will suddenly stall, and there’s a good chance that the next few years will bring even more excitement.

1 Like