How hard would it be to build a Rust transpiler that translates Rust to C?

Probably a little too hard, but you never know. I am an experienced software developer, but my knowledge of compilers is pretty poor.
I have been impressed by the simplicity of projects like the c4 compiler (GitHub - rswier/c4: C in four functions), which is not really C but still impressive, and the Tiny C Compiler (TCC : Tiny C Compiler), and I am getting this idea that I might start playing with languages.
So I was wondering how hard it would be to do something that can take a more complicated language like Rust and translate it into C. Which, I guess, would make it easier than something like LLVM bytecode.
Maybe with an LSP and DAP server, or I am asking too much? :stuck_out_tongue:

Like mrustc?

5 Likes

quite interesting, thank you. It looks a little limited, but it gives an idea

It's able to bootstrap rustc. It doesn't have a borrow checker implementation and assumes the rust code is correct according to that.
But other than that it is a pretty complete implementation.

1 Like

Just out of interest, there is also the reverse: c2rust. There was quite an interesting interview on with c2rust's author on the Rustacean Station podcast.

2 Likes

It wouldn't make it even a bit easier, probably.

The hard part in compiling Rust is not emitting LLVM IR. Emitting LLVM IR is trivial. LLVM has an API that provides high-level functions and types for emitting the IR and converting it to binary bitcode or the textual, human-readable format.

The hard part in compiling Rust is semantic analysis and intermediate optimizations, which run way before LLVM is even emitted. Borrow checking, the trait solver (generics are Turing complete!), pattern matching, desugaring of async – these all require some pretty ingenious algorithms and/or data structures and/or formulation. Once you have a sufficiently low-level, borrow-checked, generic-instantiated, desugared intermediate representation (which is MIR), converting it to LLVM is the least part of the task.

Emitting fully-functional, compiling, and correct C is actually harder. C wasn't designed to be as portable as LLVM is; it's got several instances of undefined behavior that isn't concordant with Rust's memory model; and emitting anything that is remotely intelligible and easy to optimize by a C compiler would require reverse engineering information such as types and control flow from the desugared MIR. This is very annoying to do, even if not technically unfeasible, and thus there is zero interest in doing so.

5 Likes

Well I guess that assuming that the code is correct means no error messages.
Isn't the borrow checker a significant part?

46 contributors. Holy cow.
Actually 4837 commits is quite impressive too, but somehow the number of contributors is striking me.

Okay, maybe I will abandon the idea of playing with languages.

I didn't even know what IR means :slight_smile:
And I think I read about an LLVM API some time ago, but I forgot.

Thanks for the info, interesting.

The borrow checker only accepts or rejects programs; it never affects how the program is compiled. Therefore, if a program would compile with the normal Rust compiler, then it can also be compiled by a compiler that lacks a borrow checker. The only difference is that the latter compiler will also compile unsound Rust programs, rather than reporting the error.

4 Likes

Here's my usual "what LLVM looks like and how it works" talk recommendation, if you'd like an intro to it: Understanding Compiler Optimization - Chandler Carruth - Opening Keynote Meeting C++ 2015 - YouTube.

6 Likes

It's still a useful and fun thing to play around with, but you just need to scope your goals down significantly.

If this is something which interests you, I can't recommend Crafting Interpreters (book, but available in online format for free) enough. It walks you through the design and Implementation of a relatively simple toy language.

Like any domain, you won't enjoy doing things from-scratch much if you jump in and try to handle a full-fat system from the beginning; you want to work your way up the fractal complexity involved in large systems.

(If you just want to hack on compilers rather than implement a full one, though, hacking on rustc is impressively approachable for a project of its size and longevity.)

5 Likes

I've just started reading that myself. It's long been an area I've been (internally) sensitive about my lack of knowledge of. It's a beautifully laid out book, amongst no doubt other virtues.

Haven't decided whether to code alongside in its own Java then C or do something different - perhaps I will need to try the 2nd interpreter in Rust.

Quite a long video. I watched about half of it. Interesting.
I thought "I will watch the video and then I will stay with Rust to C compilation". Instead LLVM looks actually pretty cool.
Thanks for sharing

Oh...
Well, I guess this answers my question.
Obviously the idea is to start small, but I would like to grow. If I can't make anything useful I don't see a point.
Ideally I would like a chance to add things that are not in Rust (yet), maybe just being discussed, like dependent types.
So I guess maybe I was asking too much, after all.

Oh yes, I have heard about this book. It gets pretty good reviews.
I obviously know the dragon book, and it seems that "Engineering a Compiler" is pretty good too.
Maybe I will check this one out too. Thanks

The dragon book has its legacy for a reason, but it focuses much more on the "boring" parts of compilers than the actual interesting parts of a modern compiler. This is just due to age; when the book was written, compilers and languages were much simpler things. What it discusses is still important, but the interesting parts of a compiler are the intelligence engine(s), which the dragon book doesn't really get into.

There's also the fact that modern language servers are moving away from the batch compilation structure, which was the only one that existed when the dragon book was written, thus what the dragon book assumes. More dynamic query engines and "compiler as a service" architectures are considered much more useful today than previously, or at least more worth the implementation effort.

If you want to understand "full stack" how a compiler works, I'd probably suggest making your own toy "mini rust" language and compiling that, either following Crafting Interpreters or Engineering a Compiler's path for adding complexity.

If the end goal is contributing to existing compilers, though, you don't need to understand the entire stack. There doesn't exist a single person who has familiarity with the entire process rustc does. Obviously, there are people who understand the whole architecture at a high level, as well as people who could drop in and get familiar with any component, but holding the entire compiler in your head is by design a thing that nobody has to do.

Writing your own compiler in order to be able to contribute to compilers is similar in many ways to writing your own web browser in order to contribute to a web browser. It's a large task with questionable payoff.

If the goal is to work within an existing large system, you don't build a separate version of the system accomplishing a similar task less well. You pick an interesting part of the system, familiarize yourself enough with that part, and work outwards from there. The same advice basically anyone in Open Source will give to people asking, is to pick an open issue that you care about, and just go about trying to fix it, asking questions as you go. The development Zulip for Rust has channels specifically for asking onboarding kinds of questions, and most of the developers spend some time answering those questions.

It's perhaps a bit circular, but honestly; the best way to start contributing to the rustc compiler is to just jump in and contribute. If you're worried about biting off more than appropriate, look around for issues with a “mentor” and/or an implementation sketch, and ask on Zulip.

When adding new things or making larger refactors, support from a team member is required and you should probably get it before doing significant implementation work, if only to avoid conflicting with something someone else is doing, but doing smaller, targeted improvements and/or fixes is always appreciated and a great way to familiarize yourself.

7 Likes

You won't be able to add dependent types to Rust by transpiling it to C. Instead, it would require teaching the type checker about the concept of dependent types. By the time code generation happens, type information is largely erased, and that certainly includes high-level abstractions where dependent typing would belong.

It is also the case that if you have never written a compiler before, then perhaps it's too ambitious to start by trying to add a whole, multi-man-month project to a language that is already pretty large.

That's not easy by itself, but more importantly, Rust lacks dependent types not because other people didn't want to implement it, but because…:

  1. it's not even clear how a dependent type system would interact with all of the other features of Rust that aren't to be found in traditional dependently-typed languages;
  2. this part of the type system isn't even remotely designed yet (which would be a necessary condition for implementing it); and
  3. it's not clear whether extending the type system in such a way would even be a net benefit (e.g., in my opinion the added complexity wouldn't be worth it, given how little real use dependent types have outside of theorem proving). This doesn't help drive design efforts, either.

So a more realistic goal you could set yourself would be e.g. to 1. implement a very small compiler in order to get a feeling for the general issues you will encounter, and then 2. help with the implementation and/or bugfixing of a feature that is at the very minimum already designed, so that you don't at least have to undertake that (huge) part of the work.

3 Likes

Very interesting points, thank you. I guess the dragon book is a little less important now

Actually my goal is not really the compiler, it's the features of the language.
I lke features that give power to the language, especially if they are supported by the beauty of mathematics.
I like to understand them, and most of all I like to use them.
Many people complain about complexity, but I don't mind if complexity is well strucutured and comes for a reason. For me complexity is the first thing in software, and software is the career I chose. I don't know why many developers are so afraid of it if they are supposed to deal with it on a daily basis.
But don't get me started about this :slight_smile:

Because of what I said I think it should also be clear that I like to be a litlte more independent, hence my attempt to do things on my own.

Oh I know that, the transpiling idea was an attempt to make things easier.
Now, after looking at some contributions to this thread, I whink I would give LLVM a serious thought.

That's a shame.
Okay, I don't know much about dependent types, but I do like the idea.
It's the reason why I would like to try them. I started with Java with version 1.4, without generics, because many people didn't want them. Coming from C++, I have never found it a great idea.
The fact that somethign is not used doesn't mean that it's useless to me. That's why I would like to try on my own.
Otherwise I will have to learn Ada :slight_smile:

On the other hand.... If there is anyone reading who would like to build a compiler they can understand all the way through from lexical analysis to executable code generation, and not spend months trying to do it , I suggest Jack Crenshaw's articles "Let's Build a Compiler" Let's Build a Compiler

It's a good read about programming language features, tradeoffs and history anyway (From a 1980 perspective) With that I managed to build a compiler for a C like language that generated code for x86 and the Propeller MCU from Parallax Inc. Terrible unoptimised code but I was chuffed with myself that I could do it at all, having considered compilers a "black art" for so many years.

One day I want to redo that exercise in Rust....

3 Likes

Implementing a compiler for the language is one way to ensure you understand all of the edge cases, but a significant amount of work beyond understanding the language. Implementing a compiler requires not just understanding the language's semantics in all edge cases[1], but also understanding all of the mechanisms involved in implementing all of the language's semantics.

Implementing a complex system is a completely separate skill set from understanding the use of the system, even though the former hopefully implies the latter.


  1. ... or, technically, just letting edge cases be edge cases and getting whatever behavior they end up with. ↩︎

1 Like