The Rust compiler isn't slow; we are

I wanted to vent some frustrations and used it as an opportunity to make improvements to a random open source project. The results speak for themselves.

I'm hoping this can be useful to others. I tried to keep things in a positive light, and not let the criticism feel unfair or harsh. Interested in feedback, though.

26 Likes

What I've tried to show here is that a lot of what some might consider to be a "slow compiler" can be traced back to a developer not being diligent with dependency management, or preferring runtime performance over compile time, or even "developer velocity" over compile time (whatever that means!)

When I use Rust at work I make a conscious effort to minimise the number of dependencies I pull in. Not only does this help keep compile times down, but it also reduces the amount of risk we're exposed to (e.g. a library could be discontinued and we're stuck with something that'll never get bug fixes or improvements, or it could have mission-critical issues like a security bug).

Of course, this all goes out the window the moment you pull in any sort of web server or asynchronous runtime... It's just not possible to write your own tokio or tonic so there's no other option than taking the compile time hit (in my case, 30 dependencies to ~250, with a full build going from about 20 seconds to 3 or 4 minutes).

6 Likes

The binary size number overhead looks too big. Is this compare after ruuning strip both binaries or this is compare not stripped binaries?

Very neat article, good job :slightly_smiling_face:

So if the compiler is actually quite good already and has a team dedicated to making it faster, why does it seem so slow to so many Rust developers? My suspicion is that this conception is mostly self-imposed by developers who are quick to take advantage of just how effortless it is to bring in dependencies.

This is so true: I've seen big C or C++ projects that also take astonishingly long to compile. The difference between a standard C / C++ project and a Rust one, is that pulling a dependency within the former(s) is so painful that people will often prefer to:

  1. either quickly reimplement the logic for their reduced use-case (e.g., your mentioning using iterator logic instead of a regex);

  2. or to delegate to other command-line invocations / utilities (Example: .render() method in Python's graphviz).

Which does come with its lot of problems, but does avoid having to pull and compile a full-featured dependency. And of course the issue is that this can very easily transitively scale up out of proportion: if the dependency pulled decided itself to also pull a convenience dependency, we suddenly end up with a cascade of dependencies.

So, indeed, it's not that much that Rust is a slow compiler, but rather that Rust projects end up compiling a lot of code: far more code than the equivalent counterparts written in other languages.


I personally think that this is an acceptable cost (just one to be aware of): Rust ends up winning in the long run, since at the cost of these supernumerary dependencies, Rust projects:

  • get to keep up to date with minor improvements that happen within them (compared to an overly pervasive std library like go's),

  • have more engineered and thus (hopefully) more resilient code / logic better handling corner cases (compared to 1.),

  • avoid using subcommands for some tasks, which is fragile and brittle at best, and a security vulnerability at worst.

That being said, compile-time cost vs. runtime is definitely an issue too, and the former is not always better than the latter, as showcased by your command-line argument parser case.

8 Likes

It is worth emphasising what you lose. Clap is a UI library, albeit a CLI one. User friendly argument parsing alone is far from simple (it's definitely not a wheel you'd want to continually reinvent) and Clap provides a lot of extra niceties on top of that (both for developers and end users). Would a more minimal library be a better fit for some projects? Sure! But I do bristle a bit at "just" argument parsing as if that were a simple thing even by itself.

Of course faster build times may matter enough to trade off some features or runtime performance. That's a good point well made. And if you think environment arguments are all you need for a project then adding more than the most basic parser is going to feel like a luxury too far. :slight_smile:

6 Likes

Thanks for the feedback so far! I've corrected the typo, and I will try to address some comments below.

The binary sizes shown in the article are straight out of the compiler without stripping debug symbols or otherwise attempting to optimize for space.

The trouble, in my opinion, is that though command line parsing may have a very broad set of use cases and demanding features, it's also very shallow. I don't have any information to backup this claim, but it feels like it would be difficult to find any examples of an application that truly make use of a majority of what clap offers. Let alone a majority of dependents making use of a majority of the features. See Joel Spolsky's 80/20 argument against bloat.

So yes, clap does try to fit several (sometimes very disparate) needs. It is just something to be aware of, especially when building your code feels sluggish. In that case, it may be worth questioning whether you really make use of the code you are compiling.

But don't take this response as dismissing your argument; I do agree that sometimes it's the right tool for the job.

3 Likes

May be there is some kind of misunderstanding. But stripping debug symbols is not optimization. Debug symbols was not loaded into memory during normal run (no debuger, no pancis), and obviously any kind of CPU cache is not used during normal run to hold debug symbols tables.

So comparing numbers without stripping you really compare real size + random number in other words you compare useless numbers.
And in reality may be after dependency changing described in article the usage of memory/caches was increased and programs become slower, who knows because of random numbers hide the real picture.

2 Likes

I did not say stripping symbols is an optimization, and I'm not sure what you're getting at. The article is not about making small binaries. That is a completely different topic with different motivations.

I believe the objection is that in that case you shouldn't be comparing the binary sizes of the output at all.

You note that switching from clap to another argument parsing library saved you 150K. The objection is that that measurement is meaningless because you're using the pre-stripped size.

Either compare stripped sizes or ignore the size of the compiled artifact. In the article as written, it reads as if the unstripped size is a meaningful measure because you mention it.

13 Likes

The size is meaningful because it is a side effect of the amount of Rust code compiled, regardless of what is actually produced. AFAIK, the compiler does not just output random pieces of data that are not directly related to the input.

But I will be happy to make the change. It won't matter either way. Strange thing to focus on, and completely missing the point.

1 Like

This isn't really true in general. Rustc/LLVM do a decent amount of dead code elimination, so a good deal of the code thrown at LLVM doesn't make it to the final binary.

"Look at the size of the binary" is such a loaded statement at this point in time, especially if you aren't stripping the binary, so if you're using binary sizes as a proxy for another measure, you need to call that out directly.

1 Like

I think it's important to contextualize the article. I'd say it applies to the "small utilities" case, where indeed the cost of convenience dependencies usually dominates the cost of the application itself.

For bigger applications though, most of the cost will be within the application itself. rust-analyzer is a good example here. It clocks at around 100kloc, excluding dependencies. It has a fair share of deps, most of which are entirely justtified (some outliers are insta, notify and globset which seems like they pull stuff which isn't strictly required for end-user visible functionality). However, the biggest cost is compiling rust-analyzer's own crates.

One of my current workflow botllnecks is "switch branches, compile in incremental release mode" (this is important for perf optimization and for exploratory programming, were I want to quickly check how this or that change affects the acual user experience).

This routinely takes on the order of five minutes:

13:34:23|~/projects/rust-analyzer|vfs✓
λ git switch master
Switched to branch 'master'
Your branch is up to date with 'upstream/master'.

13:34:24|~/projects/rust-analyzer|master✓
λ cargo xtask install --server
    Finished dev [unoptimized] target(s) in 0.02s
     Running `target/debug/xtask install --server`
> cargo --version
cargo 1.44.0 (05d080faa 2020-05-06)
> cargo install --path crates/rust-analyzer --locked --force 
  Installing rust-analyzer v0.1.0 (/home/matklad/projects/rust-analyzer/crates/rust-analyzer)
    Updating crates.io index
   Compiling stdx v0.1.0 (/home/matklad/projects/rust-analyzer/crates/stdx) # note that this is not the abandoned stdx from crates.io
   Compiling rust-analyzer v0.1.0 (/home/matklad/projects/rust-analyzer/crates/rust-analyzer)
   Compiling ra_syntax v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_syntax)
   Compiling ra_cfg v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_cfg)
   Compiling ra_mbe v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_mbe)
   Compiling ra_fmt v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_fmt)
   Compiling test_utils v0.1.0 (/home/matklad/projects/rust-analyzer/crates/test_utils)
   Compiling ra_db v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_db)
   Compiling ra_proc_macro_srv v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_proc_macro_srv)
   Compiling ra_hir_expand v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_hir_expand)
   Compiling ra_project_model v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_project_model)
   Compiling ra_hir_def v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_hir_def)
   Compiling ra_hir_ty v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_hir_ty)
   Compiling ra_hir v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_hir)
   Compiling ra_ide_db v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_ide_db)
   Compiling ra_assists v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_assists)
   Compiling ra_ide v0.1.0 (/home/matklad/projects/rust-analyzer/crates/ra_ide)
    Finished release [optimized] target(s) in 4m 46s
   Replacing /home/matklad/.cargo/bin/rust-analyzer
    Replaced package `rust-analyzer v0.1.0 (/home/matklad/projects/rust-analyzer/crates/rust-analyzer)` with `rust-analyzer v0.1.0 (/home/matklad/projects/rust-analyzer/crates/rust-analyzer)` (executable `rust-analyzer`)

I belive this cost is to a significant part inherent to the Rust's compilation model: static linking, reliance on generics and monomoprhisation do not lend themselvs to fast compile times. I wish we had more langauge-level control over compile times. In particular, I wish that it was possible to clearly define the set of changes you can do to a crate, which do not require recompilation of reverse-dependencies.

15 Likes

But is not it obvious that different readers find different points in article? For me compilation time from scratch is not important at all. But if clap add 400KB to .text section of elf file this is really bad.

Would using dynamic dispatch instead of generics help rust-analyzer here? I'm guessing it could help if you know rust-analyzer is instantiating loads of generic functions/types.

That said, the language/culture pushes programmers to use compile-time generics over trait objects where possible, so this glut of monomorphization is going to be present throughout the ecosystem and outside of your direct control.

(I'm not bashing the use of compile-time generics here, they just have an annoying side-effect of hurting compile times)

2 Likes

We already make heavy use of dynamic dispatch. Though, so far we weren't able to pinpoint why exactly is stuff slow to compile, so, if anyone wants to explore rust compile-time optimization for medium-sized projects, rust-analyzer might be a good benchmark :wink:

1 Like

It's not just that C uses fewer dependencies. When C uses external dependencies (not counting "header-only libraries"), they're dynamic system libraries, which means they're in a pre-compiled form.

Rust's lack of stable ABI and lack of prebuild binaries from crates-io makes using Rust dependencies slower than it needs to be.

It's a bit unfair to point fingers at how long reqwest+tokio compiles, when using libcurl.so needs no compilation at all.

4 Likes

I recall there were plans to ship prebuilt libraries from crates-io, do you know if there are any progress on this?

Note that it's not just stable ABI, many language features would be incompatible with pre-compiled binary libraries even if we had stable ABI. https://internals.rust-lang.org/t/a-stable-modular-abi-for-rust/12347/6 is a very instructive post about this issue.

2 Likes

While a hypothetical stable ABI makes precompiled libraries an obvious next step, some precompilation could be done even without a dynamic library ABI.

I know monomorphic generics and inlineable functions are not possible to truly precompile, but Cargo doesn't have to rebuild all of dependencies when the top-level crate changes, so "pre-built dependencies" in Rust's case could be roughly whatever it takes to pre-populate Cargo's build cache (but not literally in the current format which isn't efficient for this)

2 Likes

Doing this is complex, and we don't have the people or infrastructure resources needed to implement and maintain such a service currently. Discussion

3 Likes