Are there any compilation time benchmarks of Rust vs. G++ vs. Clang++?

Here you are:

% time make puzzlegen-rs                                                                                                                  
rustc -C debuginfo=2 -C opt-level=3 -C target-cpu=corei7 -Z no-trans -o puzzlegen-rs
make puzzlegen-rs  0.22s user 0.02s system 95% cpu 0.253 total

However, I’m interested in full compilation from scratch times.

I think you can’t meaningfully compare Rust and C++ compile iteration speed until Rust has a more fine grained incremental compilation :slight_smile:

But I do think that it is already possible to compare general compilation speeds of Rust and C++, I just haven’t seen any benchmarks yet (but I have seen claims that they are about the same).

You probably want to file an issue/PR to the repository with a benchmark.

I think it’s pretty clear from experience that compile times in Rust are much slower, especially on bigger codebases.
Compiling something like racer takes way longer than a big project we have at work written in C++/Qt.

From what I understood, most of the time goes on compiler analysis of the code…

Yes compilation times are slower with Rust. That being said, Clang and esp GCC has been here for quite a while and had time to improve this over the years. I’m really not that worried about this and think it will improve over time.

The compilers have been there for a long time and had time to improve but the issue is not in the tools but in the C++ standard and the limitations that come with legacy.
When modules will come in C++17 it will compile A Lot faster.

But of course I would expect Rust to also get faster compilation with time…

Now to think about it, C++ has worse compilation times in reality.
What we were comparing is actually Rust’s alternative: rustc -Z no-trans

In reality, because it’s C++, most companies also run static code checkers…

Even if you do run static code checkers this is usually part of CI or something like that and doesn’t directly affect all programmers directly while the speed of the compiler affect everyone.

Ye, true. Although where I work it’s the job of the developer to run the static code checker and unit tests + coverage(coverage actually takes the most time), they should not fail on the CI part, which I don’t agree with because the build machine is considerable more capable and CI is intended for this, but, meh, everyone with their own weird rules…

Yeah so we have static code checking + auto-tests running on CI. The whole idea of CI is to catch errors and report back… but that is derailing from the topic a bit :slight_smile:

Looks like if I want a benchmark, I have to do it myself :wink: Luckily I have a peculiar hobby of writing a ray tracer in any language I learn.



I believe that at this point both programs are pretty similar: 1200 lines of code, same features, same run time no external dependencies.

-> % rustc --version
rustc 1.8.0-nightly (fae516277 2016-02-13)

-> % clang++ --version                                            
Ubuntu clang version 3.6.2-1 (tags/RELEASE_362/final) (based on LLVM 3.6.2)
Target: x86_64-pc-linux-gnu
Thread model: posix

Full dev build

-> % time cargo build
   Compiling rustraytracer v0.1.0 (file:///home/user/trash/rustraytracer)
cargo build
2.91s user 0.15s system 99% cpu 3.065 total

-> % time clang++ -std=c++14 -O0 -g -Wall -Wextra **/*.cpp -I ./src -o ray
clang++ -std=c++14 -O0 -g -Wall -Wextra **/*.cpp -I ./src -o ray
8.48s user 0.48s system 97% cpu 9.187 total

Full release build

-> % time cargo build --release
   Compiling rustraytracer v0.1.0 (file:///home/user/trash/rustraytracer)
cargo build --release
5.97s user 0.15s system 99% cpu 6.128 total

-> % time clang++ -std=c++14 -O3 -flto -Wall -Wextra **/*.cpp -I ./src -o ray -B /usr/lib/gold-ld
clang++ -std=c++14 -O3 -flto -Wall -Wextra **/*.cpp -I ./src -o ray -B
9.79s user 0.28s system 97% cpu 10.283 total

UPDATE: see the latter post about the impact of -flto on compile time.

Runtime :slight_smile:

-> % time ./target/release/rustraytracer
10.50s user 0.01s system 100% cpu 10.514 total

-> % time ./ray > out.ppm
time: 13
./ray > out.ppm
12.36s user 0.01s system 99% cpu 12.377 total

Teapots :slight_smile:


When I tried a similar thing, Rustc came out about twice as fast as Clang, and the win over GCC and MSVC was even greater. That was almost two years ago though, perhaps I should measure again with more recent compilers.

I’ve tried to compile your Rust raytracer (with Rustc 1.8.0-nightly) and I’ve hit this problem:

…cargo\registry\src\\regex-0.1.41\src\ 46:24 erro
r: private type in public interface [E0446]
…cargo\registry\src\\regex-0.1.41\src\ Single(Sin

I have solved it using the latest regex version (regex = "0.1.56"), but then I've found another problem:

...\.cargo\registry\src\\aho-corasick-0.5.0\src\ 4:29
 error: unresolved import `memchr::memchr2`. There is no `memchr2` in `memchr`. Did you mean to use `memchr`?

This problem can probably be solved if regex uses aho-corasick = "0.5.1", because it uses memchr = "0.1", but memchr = "0.1.9" probably has already fixed the problem.

I am not a Cargo expert, how do I say Cargo to compile your raytracer using the latest version of regex but using the new version of aho-corasick? :-)


**Edit**: I've tried to compile again the Rust raytracer, and now someone has fixed something, and it compiles. I see only a little warning:

libs\geom\src\shape\ 55:27 warning: unnecessary parentheses around `for` head expression, #[warn(unused_parens)] on by default
libs\geom\src\shape\         for axis in (0..3) {

Compiling with --release I get:

    Start rendering...

    Preprocess:  0.41s
    Rendering:   1.04s
    Filtering:   0.40s

    Total: 2.01 seconds

If I compile the Rust code a little more aggressively, with O3, native CPU and lto, I get:

    Start rendering...

    Preprocess:  0.42s
    Rendering:   0.96s
    Filtering:   0.32s

    Total: 1.83 seconds

So perhaps the performance difference you saw between the Rust and C++ code can be removed with better compilation switches.

Can you post your Cargo.toml?

It is:

name = "rustraytracer"
version = "0.1.0"
authors = ["Aleksey Kladov <>"]
license = "MIT"

rand = "0.3"
regex = "0.1.56"
rustc-serialize = "0.3"
time = "0.1"
simple_parallel = "0.3"

path = "libs/geom"

path = "libs/utils"

debug = true

(I have only updated the specified regex version, from regex = “0.1”).
Do you understand the cause of this problem?

Instead of just using “cargo build --release”, you can try a little more aggressive compilation switches, compiling for your CPU, using O3, and more like LTO, and benchmark again the Rust code. I can’t do that yet because I haven’t yet managed to compile your program, for the problems above.

This is more fair for the run-time (because the C++ code is compiled with lto), but it’s also more fair for the compile-time because link time optimization on LLVM takes lot of time.

1 Like

You should try this branch for benchmarking compile times against C++ version:

It does not have any dependencies and compiles just fine.

The version in the master branch is much more feature rich then the C++ code, so it will be an unfair comparison.

Unfortunately I haven’t touch this project since summer, that’s why dependencies are old =(

I believe that --release uses -O3 and -lto. How do I enable CPU specific optimizations with cargo?

That said, the code in both ray tracers is not deeply profiled and optimized and uses horribly inefficient data structures (O(n) instead of O(log n)), so fiddling with compiler flags is not going to make run time speed comparison more meaningful.

I believe that --release uses -O2 and it doesn’t use lto.

I think compiling the two codebases with similar switches (both lto, or none lto, both for your CPU or both for a simpler one) gives more fair compilation times and run times comparisons.

Ok, --release is -O3 but is not -lto.

The story with -lto is interesting though. In Rust, Cargo compiles one crate, so the lto should not make a difference.

In C++, usage of lto is mandatory, because without it code runs three times slower. I remember spending a lot of time with C++ version to make inlining across modules work, because it really determined performance, and I was really pleased when Rust version was fast out of the box.

I think the more appropriate would be to concatenate all cpp files and compile that without lto. This makes C++ compilation faster, while preserving run time speed:

-> % cat **/*.cpp > uber.cpp && time clang++ -std=c++14 -O3 -Wall -Wextra uber.cpp -I ./src -o ray                                                ~/trash/cppraytracer
clang++ -std=c++14 -O3 -Wall -Wextra uber.cpp -I ./src -o ray  
6.07s user 0.09s system 99% cpu 6.160 total
matklad@nixos [11:23:51] [~/trash/cppraytracer] [master *]
-> % time ./ray > out.ppm                                                                                                                         ~/trash/cppraytracer
time: 16
./ray > out.ppm  16.12s user 0.01s system 100% cpu 16.125 total

So yes, your were super right about lto impact on compile times, many thanks!

Actually LTO can make a difference even if you only compile one crate, because you’ll invariably use libstd or at least libcore. It’s really super hard to get anything meaningful done without.

On the other hand, LTO has been a mixed blessing to Rust performance in the past. I have seen barely measurable speedups and even pessimizations, so I’d certainly benchmark before blindly adding -lto to my command line.