Is there any documentation out there which describes the process of digesting Rust source code and emitting the executable binary?
I know there are several steps involved (lexer, AST, lints, testing, benchmarking, compiler, linker, MIR, cross-compilation, optimizations, etc.) and several (non-Rust) components are required/recommended (cargo, LLVM, clang(?)). But I neither know if this list is complete, how those components interact, what can be configured (and how), etc.
Rather than jumping from one piece of puzzle to the next, I'd like to see some kind of big picture showing how all those gears play together along with a glossar that possibly links to the respective project website, source code, etc.
"The OpenGL Machine" is an elegant example of how a complex machinery can be boiled down to a single sheet. It doesn't necessarily have to be that low-leveled, but it gives you a good idea of what I'm looking for.
As a bonus: it would be nice to see which components are pure Rust
We don't have a single document that describes this.
Generally speaking, the required tools are rustc, and some kind of linker. On top of that is Cargo, driving rustc. That's really it: Cargo provides the proper rustc invocations, and rustc will invoke the linker.
The more interesting bit is rustc internals, which is a lot of what you're talking about: it uses llvm internally, and has the phases you're talking about, etc. The source code has README.md files of varying quality which describe each part.
A "big picture" image (like the OpenGL machine you linked) would be very helpful for people that want to start hacking the compiler.
I drew a thing by hand! (Apologies for my poor drawing skills :-). Hopefully it gives you an idea of what Cargo and rustc do.
AST === Abstract Syntax Tree
HIR === High-level Intermediate Representation
MIR === Mid-level Intermediate Representation
llvm-ir === LLVM Intermediate Representation
obj === Object file (ELF in Linux)
The rustc pipeline may be is a little off. The image in this blog post is more accurate.
The parser will "follow" mod foo items and parse other/external files as needed. The parser returns the AST of the whole crate.
As my drawing denotes, I'm not quite sure from which phase the metadata comes from.
The metadata is formatted as RBML (Really Bad Markup Language). "RBML was originally based on the Extensible Binary Markup Language" (according to the source code).
You probably already know but LLVM is an external dependency, it's used as a library and it's not written in Rust (it's written in C++).
The archiver used to be an external command (e.g. ar) but today we use an in-memory archiver that comes with LLVM. That's the new default, you can still use an external archiver if you want.
Currently, the linker is always an external command. On most platforms we use a C compiler (e.g. gcc) as a linker. Speculation: In the future, we may use an in-memory lld instead of an external command.
It's not explicitly shown but if e.g. crate B depends on crate A, rustc will load the libB.rlib and use its metadata to type check crate A. rustc may also take a generic function from libB.rlib metadata, "monomorphize" it and include that in libA.rlib (library) or ./A (executable).
Oh, and if someone wants to digitalize and improve my drawing, feel free to do so -- I give you permission/license to do so.
They don't have a "compiler internals" page on the forge. I could create it but I have almost no knowledge of what happens inside the compiler. So besides that diagram, that page would be pretty much blank..
Also I am not sure what the policy is for svg files in git repos? Personally I have found svg files unfriendly to version control when you modify them.
Hello! I apologize for resurrecting this thread, but I'd had questions about this exact same thing in a thread over here, and when I saw the images @japaric and @Azerupi made, it really helped everything click for me. Looking at the forge documentation, it looks as though no one ever uploaded it there. Am I mistaken or can we push forward with getting a PR merged in some way?