Okay, a few months ago, I had a problem with excessive cross compilation time. A lot less complicated than your compiler macros, but still ended up with 2h+ per binary with the only problem being that, well, I had to write a lot more code than for just one binary over the next few months...
Also, I cross compile to 2 different platforms, so in total each binary is build 3 times, one for the CI host for testing and one for each platform. The previous setup relying on Cargo and Github was a non-starter.
After some back and forth, here is what I did:
- Migrated my repo from Cargo to Bazel
- Separated large crates into smaller crates since Bazel hashes per crate
- Added remote cache, that is, compiled artifacts are stored in a remote cache
and just loaded from the cache in case nothing has changed.
- Added Remote Build execution that compiles the entire project on a cluster with 80 CPU cores. Thanks to BuildBuddy.com, this with a surprisingly easy.
- Re-designed the dependency graph so that each binary follows a path through the dependency graph that is disjoint from all other pathways
I cannot stress enough how much of a life saver BuildBuddy has become over time because, while Bazel is already quite fast, with proper remote build and remote cache, it just flies at the speed of light.
My mono-repo vendors all dependencies so in total we're talking about 400mb of source code with my own code being close to a hundred crates so it's a lot. Not the biggest in the world, but already enough that you don't even want to do full rebuilds with Cargo anymore.
These days with Bazel and BuildBuddy, a full rebuild i.e. when a new Rust version comes out, takes about 8 minutes and an incremental build is usually done within two minutes. In both cases, the build protocol is compile, unit test, integration test, container build, and container publish. And all binaries are build with opt level 3 for two different platforms...
The speedup comes largely from parallel execution, which comes from the disjoint dependency pathways mentioned earlier. And Bazel / BuildBuddy caches reliably everything, so that is a big one too.
In my experience, when you can chunk your larger crates that take the longest to compile into a larger number of small crates and design your dependency graph so that each output artifact runs through a disjoint path, then you may see a substantial speedup even with Cargo. The build tool is a lot less relevant to total compile time compared to the real impact of a poorly designed dependency graph.
This idea goes both ways: A slow dependency graph is basically a linear ordered sequence of gigantic crates where the first one blocks all others. Obviously, this build takes hours.
A fast dependency graph is one where each output target has its own path through the graph with little to no intersection with the other pathways. This is critical, because when you change something in pathway A, no other pathway and no other output target gets build and tested, and less compile is always faster compile. The other big idea is that a multi-path build graph is just trivial to parallelize since order between pathways doesn't matter.
Also, I read a while ago that the Cargo / Compiler team actually works on parallel compilation, but so far only on Linux and you have to enable it with a flag. Do some online search because that may help you.
I understand that a lot of people don't like Bazel for any number of reasons, and that's okay, but the observation that a lot of small crates build a lot faster in parallel than a small number of big crates that usually can't parallelize remains true. With some proper re-design of your dependency graph, you may get a meaningful speedup with Cargo without doing much other than some targeted code refactoring.