The way that crates work as the smallest unit of compilation is very concerning to me. My experience is colored by working on large software at large software companies.
A key question in my mind is - does the incremental compilation feature do anything for cold builds?
It's worth noting that cold builds are very important. At $work I end up doing one or two cold builds per day, usually, even though I painstakingly try to avoid them. That's not counting my CI system, which cold builds my branch every time it kicks off a regression test. A lot of my days end with me refreshing the CI gui, waiting for a regression test that I kicked off four hours prior to finally come home so that I can submit. Cold build time sensitively tracks developer productivity at my company, so much so that there are many team that are primarily dedicated to reducing the build times of our worst-offending binaries.
Say I have this crate, myserver
, with an extern crate lib_A
.
lib_A <+
|
|
+----------------- myserver ---------------------+
| | |
| | +----------> mydep1 |
| | | |
| | | |
| | | |
| + + |
| myserver |
| |
+------------------------------------------------+
And I want to add myclient
which needs code in mydep1
.
lib_A <+
|
|
+----------------- myserver ---------------------+
| | |
| | +----------> mydep1 <----------+ |
| | | | |
| | | | |
| | | | |
| + + + |
| myserver myclient |
| |
+------------------------------------------------+
Which I happen to know in advance should go into its own crate. (It's not always obvious ahead of time).
My understanding of Rust's compilation model, and admittedly I've only spent a few hours grazing at the surface, is that, even with incremental compilation turned on, if I add myclient
to the existing crate, the first time that I try to compile myclient
, it's also going to compile myserver
and lib_A
, even though myclient
doesn't use them at all. (I mean, there's no way to even tell cargo just to compile myclient
, is there? With cargo I can only address crates). Please correct me if I'm wrong?
My goal is to spend 0 seconds compiling or thinking about compiling myserver
and lib_A
whenever I just need to compile myclient
. The obvious thing to do is to cargo new myclient
and pull in mydep1
. But wait - how? mydep1
is stuck in this other crate, which is the smallest addressable unit that I can depend on. Now if I want to actually reap the benefits of smaller dependencies, I'll have to make mydep1
its own crate.
But myserver
is a crate that is already in production and subject to a release process. It'll take me at least two weeks to patch that crate into two new crates. Meantime, I'll have an obnoxious integration task to keep all of the changes that are being pushed into the 1-crate system synced with the new 2-crate file layout. Also, by the way, I don't want to block my check-in for two extra weeks. So, what am I going to do? If the code I need from mydep1
is pub
, then I'll likely create a new crate, and extern crate myserver
. If it isn't, then, time permitting, I'll modify it to be pub
, and then create a new crate and depend on myserver
. If time isn't permitting, I'll just add it to myserver
. In either case, cold build times will be worse than necessary. This cycle will continue until things get so bad that someone is given resources to actually break up the giant clump. That lucky fellow might decide to factor mydep1
out of myserver
and update the dependency on myserver
to point to mydep1
instead. That'll be easier said than done, though; lacking original context, it won't be easy to spot that extern crate myserver
is really just a dep on mydep1
.
How could this have been addressed? The best way is pre-emptively. If we had a "pit of success" instead of a pit of failure, our build system would default us into declaring compilation units at the most granular level that is technically possible. mydep1
would have already been its own build target for me to depend on, and I would just would have defaulted into making myclient
its own target, without needing to think about it, and exposing smaller targets to downstream developers, this time compounding a virtuous cycle.
I'm pretty sure that if Rust were ever to gain currency at my company, we would have to have a strange policy like "one crate per source file" just to have a hope of feasible build times.
The key principle is that the idiomatic best practice should be to declare compilation units at the smallest granularity that is technically possible.
In this thread, I see two sides talking past each other. One side is talking about the appropriate size of what to ship to crates.io, and the other side is talking about compilation times. It is a sin of Cargo that these two separate concerns are conflated! The best thing to do would be to declare minimal compilation units and then, for shipping to crates.io, declare a few roll-up modules that you expose and document.