The big picture of compilation in Rust

Is there any documentation out there which describes the process of digesting Rust source code and emitting the executable binary?

I know there are several steps involved (lexer, AST, lints, testing, benchmarking, compiler, linker, MIR, cross-compilation, optimizations, etc.) and several (non-Rust) components are required/recommended (cargo, LLVM, clang(?)). But I neither know if this list is complete, how those components interact, what can be configured (and how), etc.

Rather than jumping from one piece of puzzle to the next, I'd like to see some kind of big picture showing how all those gears play together along with a glossar that possibly links to the respective project website, source code, etc.

"The OpenGL Machine" is an elegant example of how a complex machinery can be boiled down to a single sheet. It doesn't necessarily have to be that low-leveled, but it gives you a good idea of what I'm looking for.

As a bonus: it would be nice to see which components are pure Rust :slight_smile:

8 Likes

We don't have a single document that describes this.

Generally speaking, the required tools are rustc, and some kind of linker. On top of that is Cargo, driving rustc. That's really it: Cargo provides the proper rustc invocations, and rustc will invoke the linker.

The more interesting bit is rustc internals, which is a lot of what you're talking about: it uses llvm internally, and has the phases you're talking about, etc. The source code has README.md files of varying quality which describe each part.

A "big picture" image (like the OpenGL machine you linked) would be very helpful for people that want to start hacking the compiler.

I drew a thing by hand! (Apologies for my poor drawing skills :-). Hopefully it gives you an idea of what Cargo and rustc do.

some notes:

  • AST === Abstract Syntax Tree
  • HIR === High-level Intermediate Representation
  • MIR === Mid-level Intermediate Representation
  • llvm-ir === LLVM Intermediate Representation
  • obj === Object file (ELF in Linux)
  • The rustc pipeline may be is a little off. The image in this blog post is more accurate.
  • The parser will "follow" mod foo items and parse other/external files as needed. The parser returns the AST of the whole crate.
  • As my drawing denotes, I'm not quite sure from which phase the metadata comes from.
  • The metadata is formatted as RBML (Really Bad Markup Language). "RBML was originally based on the Extensible Binary Markup Language" (according to the source code).
  • You probably already know but LLVM is an external dependency, it's used as a library and it's not written in Rust (it's written in C++).
  • The archiver used to be an external command (e.g. ar) but today we use an in-memory archiver that comes with LLVM. That's the new default, you can still use an external archiver if you want.
  • Currently, the linker is always an external command. On most platforms we use a C compiler (e.g. gcc) as a linker. Speculation: In the future, we may use an in-memory lld instead of an external command.
  • It's not explicitly shown but if e.g. crate B depends on crate A, rustc will load the libB.rlib and use its metadata to type check crate A. rustc may also take a generic function from libB.rlib metadata, "monomorphize" it and include that in libA.rlib (library) or ./A (executable).

Oh, and if someone wants to digitalize and improve my drawing, feel free to do so -- I give you permission/license to do so.

23 Likes

Nice drawing! :smiley:

Nit: rustc, not cargo, invokes the linker.

Right, that linker block should actually be inside /src's rustc block.

Here you go! :slight_smile:

I can't publish the svg here, but let me know where I should upload it so it doesn't get lost and can be used for further improvement, documentation or whatever.

9 Likes

@Azerupi probably submit a PR here GitHub - rust-lang/rust-forge: Information useful to people contributing to Rust

1 Like

They don't have a "compiler internals" page on the forge. I could create it but I have almost no knowledge of what happens inside the compiler. So besides that diagram, that page would be pretty much blank.. :wink:

Also I am not sure what the policy is for svg files in git repos? Personally I have found svg files unfriendly to version control when you modify them.

Typeck is performed on the HIR, not the MIR.

@azerupi Awesome!

I agree with @jethrogb. The forge would be a good place for this. Pinging @brson. @brson, thoughts on this comment?

I can help writing an overview of the compiler but I want some member of the compiler team to review it before merging it. My compiler knowledge is from before HIR was a thing :smiley:

Thanks! I will correct that :slight_smile:

That would be awesome! I am willing to make whatever diagrams are needed to make that page as easy to understand.

Forge sounds great, and it doesn't need to be perfect. I'd say just create a link directly to the image in the index, with a useful description.

Hello! I apologize for resurrecting this thread, but I'd had questions about this exact same thing in a thread over here, and when I saw the images @japaric and @Azerupi made, it really helped everything click for me. Looking at the forge documentation, it looks as though no one ever uploaded it there. Am I mistaken or can we push forward with getting a PR merged in some way?

1 Like

I believe this doc covers most of it.
https://rust-lang.github.io/rustc-guide/about-this-guide.html