Slow compile times on Windows

Hey everyone,

I'm using Rust for my project Vibe, which has a lot of dependencies, and I'm finding the compilation times on my Windows machine (Ryzen 5 4500U, Windows 11) to be quite slow. Locally, it takes about 5-10 minutes to compile, and on GitHub CI, it can take 10-20 minutes.

Since it's a desktop app, every hot reload involves compiling a single crate, which still takes around 5-10 seconds to compile and link.

On macOS, the compilation is also slow but noticeably faster and more manageable than on Windows. On Linux, using the same laptop, it's also faster, though still slow compared to Go.

Is there any way to drastically speed up the compilation process?

Additionally, why is cross-compiling with Rust to different platforms so much slower and more complex? In Go, I can cross-compile to any platform with a single command, and it only takes a few seconds even for large codebases. Why is Rust slower and more complicated in this regard?

I see that your workspace has two packages (hence at least two crates), vibe_core and vibe. Which one are you editing in this scenario?


It's likely that you could achieve some gains by splitting one or both of these crates into multiple crates, which do not depend on each other if possible (to minimize the number of recompiled dependents for any given change). rustc's incremental compilation can save a lot of work when small changes are made, but separate crates are even more able to be skipped when unchanged.

(But splitting crates also means more work for the compiler (and you) overall due to dealing with the boundaries between crates.)


It can also help to make code non-generic when possible — generic code’s compilation has to be deferred to its usage site when the generics become concrete types, so it may end up (re)compiled in a more downstream crate than it was defined. An example of this I found in your repo is in Downloader::download():

pub async fn download<F>(&mut self, url: &str, path: PathBuf, on_progress: F) -> Result<()>
where
    F: Fn(u64, u64) -> bool,

on_progress probably is not called often enough to significantly benefit from monomorphization and inlining, and the function is already borrowing several things, so you can replace it with a dyn Fn:

pub async fn download(
    &mut self,
    url: &str,
    path: PathBuf,
    on_progress: &dyn Fn(u64, u64) -> bool
) -> Result<()> {

This non-generic function will be compiled to machine code once[1] as part of vibe_core, rather than once for each place you call it. This means that it won't need recompiling when you change vibe, and it won't be (partially) compiled again for each call site. All that is less work for the compiler and linker.

Besides speculatively de-genericing your code, you might want to look for functions that generate a lot of machine code, because those are functions that might be slow to optimize, and definitely make more work for the linker. cargo-bloat can tell you what the biggest functions in your program are, and cargo-show-asm can dump the assembly (interleaved with Rust code if you wish) to see why specific functions are suprisingly big.


Also, if you haven't already, try cargo build --timings to get more information on compilation time. It's most interesting if you have multiple library crates, but there's some useful information no matter what.


  1. unless it was automatically inlined, but that looks unlikely ↩︎

3 Likes

If this is for dev purposes, are you doing optimized builds? Dev builds definitely shouldn't be taking 10 minutes!

You can often get away with opt-level=1 to do only very simple but very effective optimizations:

[profile.dev]
opt-level=1

Or even to only optimize dependencies:

[profile.dev.package."*"]
opt-level = 2

From previous discussions on this, my understanding of the main reasons are:

  • Rust is simply a much more complicated language to analyze on the front end, with trait solving, lifetime analysis, lots of global analysis, etc
  • LLVM as a backend is very heavily weighted to generate faster code, at the cost of much slower code generation
  • Rust hasn't settled down for long enough to deeply optimize the existing behavior, which is often done at the cost of making it harder to modify behavior.

There's also the fact that on Windows I think the default linker is still Microsoft's link.exe, which is pretty garbage performance-wise. You might want to try enabling the bundled lld-link, which is LLVMs clone of it, though I always forget the right flags to do that.

3 Likes

Both

It's already pretty splited (core and desktop) I think that it won't be comfortable to split it more...

Thanks. I'll try to change it.

I'll check that too. though, I'm pretty sure it's simply slow in general, it doesn't specific to that project. It's slow to me when I work on different Rust projects on Windows.
On Linux, with the same laptop it's much faster.

Yes. dev builds take 4-10 minutes for sure (in the first build)

I'll try that. though I think it's enabled by default to dev

I see. I didn't worked much with LLVM without Rust so I don't know how slow it's comparing to say, Go.

I need to get better picture of why the compile is really slow first and where the slowness come from. maybe cargo timing will help

Nah, it's opt-level=0, which is slightly faster to build but generates far slower (though easier to debug, perhaps) code. That's an option to avoid building optimized code when debug is unusably slow.

There's not too much you can do to improve an unoptimized full build time without simply building less code (or buying a faster computer, I guess :yum:)

There's a few tricks you can look into for that, but they start becoming much more specific to the libraries you're using. Check if they have documentation specifically about build time - often you can disable or enable features to significantly reduce the amount of built code, or look for suggested alternatives that have fewer dependencies, or enable using system-installed libraries instead of building them from included source. There's normally at least a little bit of low hanging fruit there.

on my 24C/48T zen2 linux machine and with rust nightly - which uses lld as linker - I get the following results:

$ time cargo build --release
[...]
    Finished `release` profile [optimized] target(s) in 1m 28s

real	1m28.797s
user	20m49.287s
sys	1m41.183s

$ cargo clean; time cargo build 
[...]
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 58.76s

real	0m58.847s
user	11m12.623s
sys	1m44.818s


$ echo "/* noop */" >> ./core/src/lib.rs ; time cargo build
[...]
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.83s

real	0m4.911s
user	0m5.240s
sys	0m7.411s

If you're getting 10-minute builds for for small modifications in dev builds I think something is invalidating the cached build artifacts in your target dir. Perhaps your IDE invokes cargo with different environment variables? Try closing it and doing a modification with a text editor instead.

Sidenote: Your build instructions should also mention libasound2-dev as dependency.

and on GitHub CI, it can take 10-20 minutes

Given the CPU-cycles above that may not be unreasonable for an uncached build. You should look into caching the target dir.

Could it be that some anti virus software is interfering? Try whitelisting the project directory.

1 Like

On Linux it compiles for me also faster. The slowness happens on Windows only.

Most likely not related to AV / disk issues.
It's compiled on NVME and the folder excluded in Windows defender.
In addition It's placed on dev drive (new concept in Windows for development) and also the AV is completely disabled.

This would have been my next suggestion!

One thing to check is that you also put the cargo home on the dev drive, otherwise all your dependency source code is still coming from NTFS. IIRC you can mostly get away with moving the ~/.cargo dir and setting $CARGO_HOME (unlike some other package managers), but it's been a while.

3 Likes

Tried that now. still takes 10 minutes to build from scratch the repo on Windows.
I guess the only options left is to buy new PC or optimize the compiler of Rust for Windows.

In my completely unscientific comparisons, I've found that Linux under WSL consistently compiles Rust projects 25%-50% faster than native Windows. Like this, for example. Same hardware, both running on the same Windows session, just with Linux running virtualized.

Another data point for comparison is compiling the project on a 16-core M3 Max (I had to change some paths in tauri.macos.conf.json to fix build errors):

$  cargo clean ; time cargo build --release

real    1m5.540s
user    10m28.260s

$  cargo clean ; time cargo build

real    0m40.113s
user    4m20.985s

From what I can tell with --timings, approximately 43 seconds of the --release build time is spent on linking (23 seconds) and running the whisper-rs-sys build script (mostly compiling C++? 20 seconds). In my case, 66% of the build time is completely outside of Rust!

Debug builds are better, only 10 seconds spent in linking, and the the whisper-rs-sys build script basically runs instantaneously (0.3 seconds) for some reason. So, only 25% of build time is outside of Rust in this case. In absolute terms, 30 seconds (discounting the linker time) for a clean build is rather decent for a project of this size. Incremental builds are entirely linking time, as expected: ~6 or so seconds, depending on which crate is modified.

Also of importance: I'm not able to build the project with rust-lld (my default to avoid slow linking times) because some build script decides to add an unsupported linker flag.


While "buy a new PC" is certainly a very appealing option to fix your build time issue, a few others to consider are:

  1. Try to get rust-lld to link the executable properly. This will save many seconds on every build, even if you have ridiculously high-performance hardware.
  2. Identify what whisper-rs-sys is doing in debug builds to be so much faster than release builds and consider "just using the faster build method" for release builds.
  3. Make an effort to remove overhead.
    • Are the benefits of a web interface really worth 10 minutes of build time?
    • What if you could cut it down by a factor of 10 by switching to egui or iced for the UI?
    • Far be it from me to tell you how to write code! I'm just commenting on the apparent irony of using heavy dependencies and questioning why it's so slow to build.

And so on...

8 Likes

Any chance someone with good CPU that running also Windows can try compare the build time with me? (I have amd ryzen 5 4500u)
It became really hard to me to develop on Windows because of build times... so I'm thinking about getting better CPU

10 minutes for a clean dev build is a bit too much, but expected for complex apps. How long do your incremental builds take? They should normally be measured in seconds. If incremental builds take more than a minute, there is some issue.

I don't think you can cross-compile Go to MacOs easily.

Go tries to boil the ocean and performs the entire compilation end-to-end with its own custom toolchain. It also obsessively avoids dynamically linking anything, including libc (they have walked back that position on the most stubborn OS's, including MacOS). This makes the process easier for the end user when it works, but very non-scalable (how many platforms exist?) and impossible to fix when it doesn't. Rust integrates with existing toolchains, including the linkers and dynamic libraries. This means that whatever you can build with C, you likely can build with Rust, but the process has more moving parts. Have you tried cross?

EDIT: Oh, you're using C++ dependencies. I don't know why you would expect easy cross-compilation or good compilation performance in that case.

Imho you should seriously consider adding a feature which allows to use pre-built Whisper. You really shouldn't be recompiling C++ if you want good build performance.

EDIT2: Build scripts are also a drag on compilation performance. I think your scripts are misconfigured. You don't emit rerun-if-env-changed for all of your used environment variables (e.g. BLAS and CUDA paths), and you also don't emit rerun-if-changed. I believe this means basically any change causes the build scripts to be rerun. Besides wasting time on building and running build scripts, this may also have downstream effects of invalidating your compilation caches. See rerun-if-changed and FAQ.

EDIT3: You are also building the same packages multiple times. Your Cargo.lock contains many duplicate packages which differ only in version. Windows builds seem particularly affected, there are many duplicated crates. E.g. windows-core is built 4 (!) times with different versions. Windows API crates are notoriously huge, this can be multimillions lines of code, with obvious detrimental effects. You really should understand why you have duplicate dependencies, and prune them aggressively, particularly windows ones. Manually pin minor versions of dependencies if need be.

Hell, you depend on 2 different versions of itertools! I get how someone can pull in multiple winapi crates, those are updated often and almost always incur an incompatible version bump. But itertools? Your dependencies are just a mess.

A simple deduplication of name = ".." strings in your Cargo.lock shows me you have 73 (!) dependencies with multiple versions. 17 of those are winapi-specific, with about half of those probably irrelevant (different targets). That's a lot!

3 Likes

Incremental build takes 10s

I reinstalled latest Windows and tried again to build the app with ryzen 5 4500u:

Build with cargo-clif
3m 10s
Build with cargo:
cargo 4m 24s

Total 660 crates.

It seems to be much better now.

The C++ dependencies I have are built in few seconds in general. eg. whisper.cpp.

What options I have to improve it? that's because I have different dependencies that depends on different versions

As for incremental builds:
as currently it takes 10s only to rebuild main.rs (once it changed). can I speed it up?
I use this:

cargo-clif build --jobs 8
`dev` profile [unoptimized + debuginfo] target(s) in 8.97s

Incremental build times are generally dominated by either the crate build time (the unit of compilation is the crate), which is easiest to fix by breaking up your crates more; or link time, which you can only really try faster linkers (Did you manage to get rust-lld working?) or simply have less code to link. It can sometimes be due to bad build scripts as mentioned, though - ideally those should only run on a an actual environment change.

cargo build --timings is your friend here.

You can easily reference forks, or local versions, including of nested dependencies using cargo: Overriding Dependencies - The Cargo Book

The juice may not be worth the squeeze, but it's generally much easier than you might expect. (Though I would like an option a little more like pnpm patch that only maintains an actual patch file)

1 Like

Lowering debuginfo levels can also result in less things to link, even if it's not code. I don't know if this applies to windows, if it's using separate debug files by default that might not apply.

Yeah, by default Windows splits debug info to .pdb files, but those also need to be linked....

Update:
I discovered that the most significant improvement to build times comes from using the nightly Rust toolchain with Cranelift. Here's how to set it up:

rustup nightly install
rustup component add rustc-codegen-cranelift-preview --toolchain nightly
$env:CARGO_PROFILE_DEV_CODEGEN_BACKEND="cranelift" ; cargo +nightly build -Zcodegen-backend --no-default-features -j10

With this configuration, the app builds in just 3 minutes and 23 seconds. However, rebuilding the main file still takes about 7.53 seconds.

I believe that once sold linker will support Windows then the recompile will be finally fast enough Support Windows · Issue #8 · bluewhalesystems/sold · GitHub

Also I created powershell macro to easily access it in every project:

code $profile

Paste

function ccargo { 
    param (
        [Parameter(Position = 0, Mandatory = $true)]
        [string]$Args,

        [Parameter(ValueFromRemainingArguments = $true)]
        [string[]]$AdditionalArgs
    )
    $env:CARGO_PROFILE_DEV_CODEGEN_BACKEND = "cranelift"
    cargo +nightly -Zcodegen-backend $Args $AdditionalArgs
}

Reload

. $profile

Then you can build with

ccargo build # clean, etc...