Compile Rust dependencies independently for large projects?

I have a large Rust project (11M LoC across 14K crates) that is compiling slowly. I have already asked for and received advice (Reddit - Dive into anything) however this was focused on how I could generate Rust that is faster to compile.

Assuming I am already generating optimal code, I would like to optimise how I am compiling the project. It is available at GitHub - fmckeogh/rust_stack_repro. The foo binary is an example of how I intend to use the generated code, and takes a very long time to compile.

I am seeing a crates' compile time grow in proportion to the size of its dependencies. IE. with one function per crate, small function foo calls large function decode_execute, but foo takes a long time to compile, even when Cargo already printed "Compiling decode_execute ".

If Cargo has already compiled decode_execute, why would foo's compile time be affected by anything other than the size of decode_execute?

Is it possible to configure Cargo/rustc/LLVM to not perform any cross-crate optimisation, and only compile each crate once then link all artefacts (ideally statically) together?

That is a lot of crates. I wouldn't be surprised if the issue here is having to load crate metadata for all those dependencies and maybe name resolution doesn't like that much dependencies either. And finally the linker is going to have a hard time churning through all object files.

I am fairly certain that isn't even done at all. For MIR inlining the bodies are both too big and I assume you haven't enabled LTO.

If you want to know exactly what takes so long, you may want to try the self-profile option of rustc. Intro to rustc's self profiler | Inside Rust Blog is a decent introduction to it. Just be aware that -Zself-profile nowadays generates a single .mm_profdata file rather than separate .events,.string_data and .string_index files as was the case back when that blog post was written.

3 Likes

@bjorn3 Thanks! I'll try that:)

No, I have not enabled LTO.

Is the crate metadata/name resolution occurring in cargo or rustc? Just if I wanted to try profiling/optimising that logic.

I'm using mold so the final binary linking is still very fast, it's just the few top-level crates that have the most dependencies that take up to 20 minutes for a few hundred lines.

Appreciate the help!:slight_smile:

Rustc.

If you want to only profile the compilation of foo, you can use something like cargo +nightly rustc -p foo -- -Zself-profile=$(pwd) to get a .mm_profdata profile in your current working directory which you can then read with measureme tools like summarize (outputs table of all queries/compilation passes and the time they took), crox (outputs a profile that can be loaded using the chrome profiler) or flamegraph (directly generates a flamegraph as svg you can open in your browser).

1 Like

Note that Cargo uses parallel, pipelined compilation, and “Compiling” means it started a compilation job, but nothing is printed when they finish. Use cargo build --timings for concrete information about how long each crate takes and how they overlap.

Do you mean https://github.com/fmckeogh/rust_stack_repro/blob/main/u__DecodeExecute/src/lib.rs?

pub fn u__DecodeExecute<T: Tracer>(...

That's a generic function, so when its crate is compiled, it is compiled only to MIR, and machine code is only generated when it is used with some concrete T type. If you can afford dynamic dispatch at run time, then consider replacing &T with &dyn Tracer, so that the function is non-generic.

It still might be inlined into the calling function and crate; you can add #[inline(never)] to stop that and have each rustc invocation only deal with its own crate's code, but of course that means there will be the run-time cost of more function calls and no optimizations based on inlining.

Also consider that “one function per crate” is likely imposing costs of its own — costs of compiler startup, saving and loading rlib files, and Cargo processing the dependency graph. I assume you're doing this for testing the compilation behavior, but if not, really consider having far more than one function per crate. rustc's incremental compilation feature means that recompiling one function in a crate is still cheaper than recompiling the crate from scratch.

6 Likes

@fmckeogh

Not sure if it is till relevant, but at this scale you may consider building your project with Bazel, vendor all dependencies, and use Remote Build Execution (RBE) and Remote Cache on BuildBuddy to build on a cluster. BuildBuddy has a free tier and some OpenSource offering so just talk to them.

Vendoring means, you declare all deps, download the source into one folder, usually called thirdparty, pre-compile all of them, and sleep well at night. This solves a number of pesty problems:

  1. No more slow compile time; I guess that is what you want.
  2. No more build fails in case a dep cannot be donwloaded. It is rare, but still happens.
  3. No more subtable bugs when a dependency somehow changes and over-writes an existing version number. This hit me only once and it's pretty insane to debug and identify.

I think, Cargo alsi supports vendoring so you may try this first before before looking into Bazel.

I have "only" 70 crates, but my build time dropped significantly once I moved to Bazel thanks to parallel compilation and caching that actually works. A full build for me is release build, unit test, integration tests, cross compilation to two platforms, and packing & push multi-arch Docker images. All this with Bazel, RBE and Remote Cache is usually done around 1 to 2 minutes for incremental builds. For a full rebuild from scratch, which I just had yesterday, it takes about 8 minutes thanks to vendoring and parallel execution on the 80 cores BB cluster. I loosely remember, GH took about hour or so mostly due to build multi-arch containers, but GH is long gone...

I understand that you don't easily convert 14k crates to Bazel, but when I looked at the repo you have linked, a lot of the crates there look fairly straight forward, so you can use some clever scripts to parse Cargo.toml and generate the matching BUILD.bazel file. rules_rust for Bazel do support loding Cargo.toml with some macros, but believe me, this bogged already down on just 50 crates because macro expansion takes too long so you really want to vendor everything from the get go since Bazel works best with vendored deps.

One word about static lining in Rust, this will blow up your comp time immediately and, frankly, there is nothing you can do about other than building multiple artifacts with disjoint dependencies sub-graphs. Consider static linking as last resort option and measure your comp time carefully on a sub-set before making a decision.

1 Like

@marvin-hansen Thank you for your advice, I'll definitely give Bazel a try:)

Your welcome.
There are a bunch of Rust Examples with documentation that should get you started.

https://github.com/bazelbuild/examples/blob/main/rust-examples/README.md