Noob: Why is the performance of release build so much better?

Sorry for the Noob question. I was executing my application after building with dev profile as well as the release profile. The performance of the release binary is significantly better than the debug binary. I understand that the compiler applies a bunch of optimizations to speed things up - what are some examples of such optimizations?

One of the most important optimizations I think is inlining. Rust uses a lot of small functions that would block optimizations if not inlined.

3 Likes

Rust uses LLVM to optimize the code, and there are lots of big and small optimizations it applies. It's basically the same for Rust and C, so documentation for C applies:

13 Likes
[profile.release]
lto = true

This setting is also your friend.

2 Likes

I also know that Rust iterators gets heavily optimized. What is a struts with methods in dev might become a vectorized SIMD loop in release.

2 Likes

Here are some optimisations off the top of my head:

  • Inlining - copies the body of a function into its caller
    • This lets you skip function call overhead (every time you call a function you need to update the stack pointer, copy inputs into registers or the stack, and set the return location)
    • Enables loads of other optimisations because now the optimiser can "see" more context when it is optimising a function
  • Vectorising - sometimes you can do the same operation to multiple pieces of data in one instruction
    • Often involves SIMD
  • Constant folding - evaluate expressions at compile time
    • this means something like let x = 5 + 2 would be replaced with let x = 7
    • can also become more advanced, to the point where the compiler interprets a bunch of code at compile time
  • Detect common patterns
    • Some expressions (e.g. let mut sum = 0; for i in 0..10 { sum += i; }) have a closed form which lets you avoid the loop and maths, in this case, n*(n-1)/2
  • Dead code elimination - throw away code that you know can never be run
    • if previous optimisation passes result in something like if false { ... }, I can throw away the branch and generate straight line code (this is part of why you'll often skip bounds checks on iterators)
    • If you write something to a variable which never gets read, I can skip the write (you often see std::ptr::write_volatile() used to prevent this optimisation)
  • Copy elision - if I can see you are making unnecessary copies, I might be able to skip the copies and write to the destination directly
    • Often seen in Return Value Optimisation

C++ has the "as-if rule" which governs the optimisations a compiler is allowed to make.

This provision is sometimes called the “as-if” rule , because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program.

For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

10 Likes

Really fascinating. Thanks everyone for the all the great answers and reading materials!

If you're curious about how optimizations work, I recommend this talk as an introduction:


It's from a C++ conference, but it's about LLVM which is the same optimizer rustc uses, so is still highly applicable.
5 Likes

Also relevant talk, also by Chandler Carruth from a C++ conference

TL;DR is “There Are No Zero-cost Abstractions”. There are:

  • Human costs: Good abstractions can lower this.
  • Run-time costs: speed of the code itself
  • Compile-time costs: speed of compiling said code.

When you try to lower the human cost by using certain abstractions, you need to pay that cost elsewhere.
Rust's abstractions are not really "zero cost" then. Rust is a language of zero runtime-cost abstractions (but that doesn't have quite the same ring to it).

When you compile with optimizations, you pay the cost for all of Rust's abstractions at compile time. When you compile in debug mode, you play the cost for Rust's abstractions at run time.

7 Likes

Let me add a couple more links here :slight_smile:

I love the Catalogue of optimizing transformations paper by Francis Allen. She documents the most important optimizations (with notable exception of scalar replacements of aggregates), without getting lost into minutia details of compiler IRs.

For "why Rust is slow in debug, while other languages aren't", I recommend this post: https://robert.ocallahan.org/2020/08/what-is-minimal-set-of-optimizations.html

The TL;DR is that Rust leans more heavily on compiler optimizations than other languages. It's not only that optimizations make code faster (this is as true of C as it is of Rust), it's that Rust code is particularly inefficient without optimization applied, due to the fact that it leans heavily on traits & generics.

4 Likes

Chandler makes excellent points but IIRC his definition of zero cost abstraction is not what other people use. Zero cost doesn't mean something is free.

The original meaning is that you don't pay for what you don't use and that if you use it, you wouldn't be able to write it better than what the language/compiler provides. withoutboats has a blog post about that too.

Even though the second part of zero cost abstraction is about generating code, it doesn't necessarily relate to code optimisations. Somewhat contrived example:

  • you can use Vec and iterate over it using iterators and call lambda to perform an operation on each element
  • you can hand-write Vec-like array and iterate over it using indexing and pass elements' pointers to a function

Neither would be optimised in a debug mode so from this perspective zero cost and optimisations are orthogonal. But the Vec approach abstracted away low level bits around arrays, accessing elements manually, creating a one-off function.

Using Chandler's talk:

  • human cost is lower in Vec
  • run-time cost is similar in both (doesn't matter if release or debug mode)
  • compile time cost is similar in both

I think that was exactly the point of his talk: that calling it "zero cost" is misleading.

Bjarne Stroustrup: "...What you do use, you couldn’t hand code any better."
Certain things probably couldn't be helped, like inlining or something, but I feel like your 2nd version could be made a lot faster. It'd definitely involve some unsafe to get around those pesky bounds checks. The end result might be unweildy, but it'd actually be the best/fastest code (for the given optimizations).
In this case, compiling without optimizations, using the Vec and iterator API would be much slower than the unabstracted version.

I dont think that they're orthogonal because you don't get the "you couldn’t hand code any better" behavior without the optimizations.

1 Like

The people steering the C++ spec prefer "zero-overhead abstractions" these days, to address that ambiguity.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.