The big picture of compilation in Rust

A "big picture" image (like the OpenGL machine you linked) would be very helpful for people that want to start hacking the compiler.

I drew a thing by hand! (Apologies for my poor drawing skills :-). Hopefully it gives you an idea of what Cargo and rustc do.

some notes:

  • AST === Abstract Syntax Tree
  • HIR === High-level Intermediate Representation
  • MIR === Mid-level Intermediate Representation
  • llvm-ir === LLVM Intermediate Representation
  • obj === Object file (ELF in Linux)
  • The rustc pipeline may be is a little off. The image in this blog post is more accurate.
  • The parser will "follow" mod foo items and parse other/external files as needed. The parser returns the AST of the whole crate.
  • As my drawing denotes, I'm not quite sure from which phase the metadata comes from.
  • The metadata is formatted as RBML (Really Bad Markup Language). "RBML was originally based on the Extensible Binary Markup Language" (according to the source code).
  • You probably already know but LLVM is an external dependency, it's used as a library and it's not written in Rust (it's written in C++).
  • The archiver used to be an external command (e.g. ar) but today we use an in-memory archiver that comes with LLVM. That's the new default, you can still use an external archiver if you want.
  • Currently, the linker is always an external command. On most platforms we use a C compiler (e.g. gcc) as a linker. Speculation: In the future, we may use an in-memory lld instead of an external command.
  • It's not explicitly shown but if e.g. crate B depends on crate A, rustc will load the libB.rlib and use its metadata to type check crate A. rustc may also take a generic function from libB.rlib metadata, "monomorphize" it and include that in libA.rlib (library) or ./A (executable).

Oh, and if someone wants to digitalize and improve my drawing, feel free to do so -- I give you permission/license to do so.

23 Likes