Increasing the Number of `rustc` Commands Generated by Cargo for Distributed Compilation

I'm currently developing a system for distributed compilation, where rustc commands are sent to remote workers for execution.

However, I've noticed that Cargo seems to generate only one rustc command per crate. This significantly limits the level of parallelism I can achieve, since the number of rustc commands directly correlates with the number of tasks that can be concurrently processed.

My question is: is there a way to increase the number of rustc commands that Cargo generates, or to otherwise split the compilation of a single crate into multiple, smaller compilation tasks?

Any insights or alternative suggestions to effectively utilize distributed compilation for Rust projects would be greatly appreciated.

No, as far as I know rustc compiles a whole crate at a time, and there’s no way to split up this task. This is also a reason why many large projects (e.g. rustc itself) are split up manually into several crates, which can improve compilation speeds due to better parallelization (even ignoring the potential for any distributed approach), and speed of re-compilation due to the caching of unmodified dependencies.

rustc internally also parallelizes some of its compilation steps, and as far as I’m aware, as compilation speed is a known issue with Rust, there’s slow but ongoing work to further improve this… but as long as even better parallelization within rustc compiling a single crate is work in progress, any prospect of being able to split this onto independent machines are still very far fetched.

And that's intentional - rustc compiles the whole crate at once. It may split the work internally between multiple compilation units (and spread these over the CPU cores), but the "one crate - one process" thing is essentially unavoidable, since Rust compiler have to use the whole crate in its analysis.

1 Like

just FYI, I happened to come across this video the other day:

1 Like

I appreciate the responses. I'm considering developing a tool that would essentially partition a crate with multiple .rs files into several distinct crates, each containing a single .rs file. Following this, I would manually generate rustc commands for each of these newly formed crates.

The idea is to transform the large crate into smaller compilation units, potentially facilitating distributed and parallel compilation.

My main question here is whether this approach could indeed help increase the compilation speed. Are there potential drawbacks or obstacles I should be aware of when splitting a crate in this manner?

Would love to hear your thoughts and insights on this.

If it was that easy to split up a crate, then the compiler could do the optimization itself.

Off the top of my head: Within a crate

  • associated impl blocks can be in a separate file from where the type is defined, which doesn’t work across crate boundaries
  • different orphan rules apply than across crate boundaries
  • pub(crate) visible items can be accessed, whereas this visibility breaks across crate boundaries
  • different modules/files may depend on each others types/functions/etc… in a cyclic manner
    whereas inter-crate dependencies must be acyclic
  • [probably more…]

The orphan rules could be adjusted with special compiler support to “break up” crates, so it’s not a fundamental problem (for a compiler) though something that you cannot easily circumvent either; both trait implementations and especially inherent impls in separate files from the type aren’t too common, so maybe the problem can be avoided with not too much effort. Maybe you’ll need to keep a handful of groups of files together though.

pub(crate) visibility can be modified easily to full pub, so it’s almost a non-issue.

Most likely, the possibility of cyclic interdependencies is the hardest obstacle you’re going to face. Even in cases without cycles, you will need to figure out the graph of interdependencies somehow.

Edit: On second thought, there’s also the problem that most .rs files are somewhere deep in the module tree, which means your paths are different to access things between crates, and also there’s more visibility problems to address than just pub(crate) cases.


As someone who once refactored a decently sized Rust codebase from a single monolithic crate into many smaller ones, this was the hardest thing.

I ended up having to spend a lot of time building constructs to dynamically attach subsystems together and refactor a lot of code to work with them. I cannot imagine how you could do this automatically given that different places in the code required different solutions due to the specific details of what was going on. That's not even mentioning the nightmare caused by borrowing going from plain all-pub structs to trait objects.

In the end, I got about ~60% of the way done and stopped. Both because that was enough for what I needed, and because I was about to go completely barmy.