What are compile flags that will speedup compile time in debug build?

Hello, I'm searching compile flags that can speedup compile speed in development. So far these are what I can collect by reading the documentations that I can find. I make them into app that I use it as cargo custom cranelift run

I don't exactly know the detail, but running cranelift + these custom flags gives faster compile speed than running cranelift without any custom flag. The jump is from 1 minute to 19 seconds for medium backend project for fresh compile in debug build. It's 3x faster

What are other compile flags that will speedup compile speed that I don't know aka not included in my code yet?

Eg flags to speedup compiling generic or macro, etc

use std::env;
use std::process::{Command, exit};

fn print_usage() {
    eprintln!("Usage:");
    eprintln!("  cargo custom [check|run|build] [options]");
    eprintln!("  cargo custom miri [check|run] [options]");
    eprintln!("  cargo custom cranelift [check|run|build] [options]");
    eprintln!("  cargo custom -h | --help");
}

fn run_clear() {
    let clear_status = Command::new("clear").status();
    if clear_status.is_err() {
        exit(1);
    }
}

fn get_base_rust_flags(use_cranelift: bool) -> String {
    let mut flags = String::from(
        "
        -Zthreads=0
        -Zshare-generics=y
        -C debuginfo=0
        -C prefer-dynamic
        -C link-arg=-Wl,--threads=0
        -C metadata=dev
        -Zinline-mir=off
        -Zproc-macro-backtrace=off
        -Zvalidate-mir=off
        -C embed-bitcode=no
        

        ",
    );
    if use_cranelift {
        flags.push_str(
            "
            -Zcodegen-backend=cranelift
            ",
        );
    } else {
        flags.push_str(
            "
            -C llvm-args=--inline-threshold=0
            -C no-prepopulate-passes
            ",
        );
    }
    let mut clean_flags = flags.replace("\n", " ");
    let mut mold_available = false;
    if let Ok(status) = Command::new("mold").arg("--version").status() {
        if status.success() {
            mold_available = true;
        }
    }

    if mold_available {
        clean_flags.push_str(" -C link-arg=-fuse-ld=mold -C link-arg=-Wl,--threads=0");
    } else {
        if let Ok(status) = Command::new("lld").arg("--version").status() {
            if status.success() {
                clean_flags.push_str(" -C link-arg=-fuse-ld=lld -C link-arg=-Wl,--threads=0");
            }
        }
    }
    clean_flags
}

fn set_sccache_if_available(cmd: &mut Command) {
    if let Ok(status) = Command::new("sccache").arg("--version").status() {
        if status.success() {
            cmd.env("RUSTC_WRAPPER", "sccache");
        }
    }
}

fn handle_standard_action(action: &str, remaining_args: &[&str]) {
    run_clear();
    let mut cmd = Command::new("cargo");
    cmd.arg(action);
    let rust_flags = get_base_rust_flags(false);
    cmd.env("RUSTFLAGS", rust_flags);
    set_sccache_if_available(&mut cmd);
    cmd.args(remaining_args);
    let next_status = cmd.status();
    match next_status {
        Ok(status) => {
            if status.success() {
                exit(0);
            } else {
                exit(status.code().unwrap_or(1));
            }
        }
        Err(_) => exit(1),
    }
}

fn handle_miri_action(miri_action: &str, remaining_args: &[&str]) {
    run_clear();
    let mut cmd = Command::new("cargo");
    cmd.arg("miri").arg(miri_action);
    let miri_flags = "
-Zmiri-disable-validation 
                      -Zmiri-disable-alignment-check 
                      -Zmiri-disable-data-race-detector 
                      -Zmiri-ignore-leaks 
                      -Zmiri-disable-isolation 
                      -Zmiri-preemption-rate=0 
                      -Zmiri-provenance-gc=0 
                      -Zmiri-no-extra-rounding-error
".replace("\n", " ");
    let rust_flags = get_base_rust_flags(false);
    cmd.env("MIRIFLAGS", miri_flags);
    cmd.env("RUSTFLAGS", rust_flags);
    cmd.args(remaining_args);
    let next_status = cmd.status();
    match next_status {
        Ok(status) => {
            if status.success() {
                exit(0);
            } else {
                exit(status.code().unwrap_or(1));
            }
        }
        Err(_) => exit(1),
    }
}

fn handle_cranelift_action(cranelift_action: &str, remaining_args: &[&str]) {
    run_clear();
    let mut cmd = Command::new("cargo");
    cmd.arg(cranelift_action);
    let rust_flags = get_base_rust_flags(true);
    cmd.env("RUSTFLAGS", rust_flags);
    set_sccache_if_available(&mut cmd);
    cmd.args(remaining_args);
    let next_status = cmd.status();
    match next_status {
        Ok(status) => {
            if status.success() {
                exit(0);
            } else {
                exit(status.code().unwrap_or(1));
            }
        }
        Err(_) => exit(1),
    }
}

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() < 3 {
        print_usage();
        exit(1);
    }
    let arg1 = args[2].as_str();
    if arg1 == "-h" || arg1 == "--help" {
        print_usage();
        exit(0);
    }
    if arg1 == "miri" {
        if args.len() < 4 {
            eprintln!("Missing sub-command for 'miri'. Expected 'check' or 'run'.");
            eprintln!();
            print_usage();
            exit(1);
        }
        let miri_action = args[3].as_str();
        match miri_action {
            "check" | "run" => {
                let remaining_args: Vec<&str> = args.iter().skip(4).map(|s| s.as_str()).collect();
                handle_miri_action(miri_action, &remaining_args);
            }
            _ => {
                eprintln!("Unknown sub-command for 'miri': {}", miri_action);
                eprintln!();
                print_usage();
                exit(1);
            }
        }
    } else if arg1 == "cranelift" {
        if args.len() < 4 {
            eprintln!("Missing sub-command for 'cranelift'. Expected 'check', 'run', or 'build'.");
            eprintln!();
            print_usage();
            exit(1);
        }
        let cranelift_action = args[3].as_str();
        match cranelift_action {
            "check" | "run" | "build" => {
                let remaining_args: Vec<&str> = args.iter().skip(4).map(|s| s.as_str()).collect();
                handle_cranelift_action(cranelift_action, &remaining_args);
            }
            _ => {
                eprintln!("Unknown sub-command for 'cranelift': {}", cranelift_action);
                eprintln!();
                print_usage();
                exit(1);
            }
        }
    } else {
        match arg1 {
            "check" | "run" | "build" => {
                let remaining_args: Vec<&str> = args.iter().skip(3).map(|s| s.as_str()).collect();
                handle_standard_action(arg1, &remaining_args);
            }
            _ => {
                eprintln!("Unknown command: {}", arg1);
                eprintln!();
                print_usage();
                exit(1);
            }
        }
    }
}

Have you used timings to see what's actually taking time?

Sometimes causes are outside Rust, like a slow linker or slow disk access (Cargo's fingerprint cache and incremental builds create LOTS of tiny files which makes them sensitive to filesystem latency and hurt perf in VMs with FUSE or network filesystems).

Sometimes there are massively unexpected slowdowns from dependencies that do whacky stuff with macros, huge autogenrated code, or some pathologically complex types that run into quadratic costs in the compiler.

It also depends whether you optimize for cold builds or incremental rebuilds of only workspace crates. If you're caching deps, you can build them optimized for size - they'll be faster (especially important for proc macros) and smaller, so there will be less to link.

For speeding up of cold full builds, like in a wasteful CI that doesn't have good caching, add good caching :slight_smile: sccache has support for multi-tier caches now, so you can have a smaller disk cache and a larger S3 cache. When using sccache, keep env vars clean and build dir paths stable and consistent - sccache is sensitive to random inputs on the rustc command line.

Check cargo tree -d to ensure you're not building deps twice. You may need to unify major versions, crate features, and optimization flags.

Check if you have sys dependencies that build from vendored sources. You may be able to make them use prebuilt system packages. Also check if sys deps needlessly run bindgen (prebuilt bindings would be faster).

If you're looking to speed up the situation on your local machine while developing, I find these things make a big difference:

  1. setting the environment variable CARGO_INCREMENTAL=1
  2. aligning the environment variables for rust_analyzer (run vs test vs check)
  3. reusing the rust_analyzer build target for command line builds if you align your main environment variables with the ones rust_analyzer has or specifically not reusing it if you don't (see note below).

Note: points 2 & 3 bring a compromise which you may not wish to take

Generally, I don't find a need to do anything in build.rs to speed things up as the issues are different between "my machine" and "CI" as @kornel points out above.

You can see my baseline setup for local machine here rust/.devcontainer/devcontainer.json at main · MusicalNinjaDad/rust

Here is the timing result

I already did the timing some in the older version. Now I just tried it in the newest version, the result is mostly same, macro, generic, linker, compiler backend

I already use mold linker in the code, that will fall back to lld if mold is not available, then fallback again to default linker if lld is not available. I also already use sccache in the code

I am aware of another linker named wild, based on my old test it's speed is not stable yet, plus I need to install it first again. The code is propotype that I will add wild linker as the next fallback

My goal is to have debug compile setup that works fast in any circumtance and easily reusable, not if macro is slow then avoid macro, or if some dependency slow then avoid said dependency, but how to make macro compile fast that is what I'm looking for. Same for another aspects that affect compile speed become slower, how to make those things faster via compile flags? Eg flag to cache macro so that when it needs to compile macro it compiles faster, flag to share generic so that when it compile heavy generic code it compiles faster, that both already included in the code, etc other flags that I don't know. Do you know flags or tips to make debug compile faster that is not included in the code yet?

The most notable thing about that timing chart is the fact that it is running only one job at a time — Cargo thinks you have only one CPU core to use.

Max concurrency: 2 (jobs=1 ncpu=1)

Assuming you don’t actually have a single-core machine, then fixing that will get you the most significant speedup.

But when the build is not from-scratch, it should only take 3 seconds, either way.

The reusing the rust analyzer work is new for me, I will search more info about it and try it

For incremental compile, it is enabled by default in debug build, I never recompile from scratch after the first initial full compile unless I clean the compile cache with cargo clean. But even with that, it is still not enough for GUI development eg I did GUI development with GPUI some days ago, each time I made a change even if it is just 1 line change, it need to wait for long before I can review the new looks in the graphic

While the test that I sent above was in my vps that has 1 core 1 GB, I didn't use rust analyzer there because it always crash OOM just by opening the project and rust analyzer starts the initial scan if the project use many enough dependencies eg axum with db library

When developing using GPUI I use my local laptop that has 8 cores and 20gb ram, it is usable for development medium dependency app. But for big dependency app like GUI with GPUI library or Tauri or Bevy, the incremental compile speed is slow

It is because I did the test in my vps, the vps has 1 core cpu 1gb ram. I am looking for compile flags that can speed up debug compile speed, not pouring more hardware so that it is faster. Because I'm creating compile setup that is portable, not it is only fast if it is in high end device. If I can make it fast in my vps, then the setup will make my local laptop development experience even faster expecially when building GUI app, because even with 8 cores and 20gb ram and nvme ssd, the incremental compile when building native GUI app eg using GPUI is still slow

Configure debug_assertions to NO while you're not debugging; will cut out a sizeable chunk of code.

Hmm ... if you have incremental enabled it sounds like either rebuilding a hell of a lot (is your crate huge?). Do you get similar timings to ones in your gist for a second build after a simple change? (if you do then I'd look at why it's building all your deps each time). Did you try stopping rust_analyzer, running a command-line build, changing one line and re-compiling at the command line? That's the easiest way to see whether it's the interplay or whether it's something else.

Yeah, I miss about that one. I will search the compile flag to disable debug assert and add it, thank youu

Ahh my bad english that makes missunderstanding :[

I mean :

  • that is the initial compile
  • after initial compile, it doesn't recompile from 0 again, aka the incremental works
  • the incremental is usable for medium dependencies app, but still slow for many dependencies app like when using GPUI or Tauri or Bevy library
  • I looking for compile flags or tips that is not present in the code yet that speedup it more

I just read proc macro does many heap allocations, replacing the allocator with fast allocator can speedup proc macro

But how to change the allocator that is used by the compiler without rebuilding it?

I found this command

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmimalloc.so cargo run

Is it really change the allocator that is used by the compiler to mimalloc? Because I don't know how to verify it :[

Don't worry, your english is fine.

I am still confused by the fact that having more dependencies slows down your incremental build, after the first build. Generally, the rebuild time should only depend on your own code.
Maybe you could post a timing from a slow (incremental) rebuild after changing a single line which would help see where you are losing time.

The reason I keep focussing on this is: it's usually most effective to avoid doing work at all than to try to make the work faster. And then, knowing where your bottleneck is lets you target specific optimisations to fix it.

On Linux rustc already uses jemalloc as allocator. We have tried using mimalloc multiple times, but the last couple of times it was tried we got segfaults: [TEST] Try out mimalloc v3.3.1 by Zoxc · Pull Request #155575 · rust-lang/rust · GitHub While the times it did work, there were significant memory usage regressions: rustc performance data

Yeah, I did benchmark the compile speed when using jemalloc and mimalloc multiple times. Jemalloc seems has better performance in many small allocations. I haven't run the comparison in multithreaded device yet

I read somewhere in the internet that mimalloc uses higher memory because it preserve bigger memory pool size for fast allocation to the next allocations, so it has better chance of not frequently asking memory to the OS

Now I jusg figured out how to speed up compile speed singnificantly, far faster than scache

I tried it on tokio + serde, compiled within 1.7 second in initial compilation. While also saving SSD storage