Possible Rust-specific optimizations

blonk · August 15, 2022, 9:42am

I've read that a benefit of HIR and MIR are that they could facilitate Rust-specific optimizations. I also read that this is not currently done -- that all optimizations are currently left to LLVM.

Are there any known Rust-specific optimizations that could be done, but no one has gotten around to implementing?

Also, when browsing through the LLVM code generation reference, it's apparent that there's quite a bit of focus on C/C++. I realize that this doesn't necessarily mean that the LLVM inputs and outputs are specifically tailored to only make C/C++ faster, however I am curious of there are any known optimizations that LLVM could do (but currently isn't) that would benefit Rust specifically?

(Just asking out of curiosity. These are unsolicited questions my brain decided to bring into existence while playing around on godbolt).

Heliozoa · August 15, 2022, 10:39am

This at least used to be the case, sort of. LLVM has a noalias attribute that is widely applicable to Rust thanks to &mut being guaranteed by the compiler to not alias, and it helps LLVM optimise Rust programs by using this fact. I'm not an expert so I couldn't tell you how exactly it helps though.

The attribute is not used that much in C and C++ (due to being hard to enforce I suppose) and as a result it had several bugs in it that the Rust compiler kept hitting. It became a bit of a running theme that Rust would hit a noalias bug, disable it until the bug was fixed, re-enable it, and then hit another bug not long after. I think it's been on for over a year now so this period is hopefully over.

As for optimisations on the Rust side and whether there are other things like noalias that LLVM could implement in the future, I'm curious myself so hopefully someone more knowledgeable will be able to answer those.

H2CO3 · August 15, 2022, 11:01am

It allows the compiler to eliminate loads when an immutable place is known not to alias with any mutable ones:

fn print_and_increment(x: &i32, y: &mut i32) {
    println!("x = {}", x);
    *y += 1;
    println!("x = {}", x);
}

Since x and y are known not to alias, the compiler can assume *x doesn't change when something writes through y, so it can cache *x in a register instead of reloading it from main memory upon printing it for the second time.

scottmcm · August 15, 2022, 2:52pm

This statement is out of date. See https://github.com/rust-lang/rust/tree/master/compiler/rustc_mir_transform/src for a bunch of optimizations that rustc can do. (Not all of them are turned on, and some are non-optimization transformations, but many of them are optimizations that are on by default.)

As a particularly exciting and timely one, 1.64 will https://github.com/rust-lang/rust/pull/91743. That allows doing inlining on the generic version of a function, which can be done once instead of needing LLVM to do it on every monomorphization of the function. And since that's in the middle of the compiler, it can help all backends -- notably the cranelift backend, which doesn't itself do inlining, got way faster thanks to it.

scottmcm · August 15, 2022, 3:09pm

For another example of this, see MIRI says `reverse` is UB, so replace it with something LLVM can vectorize by scottmcm · Pull Request #90821 · rust-lang/rust · GitHub

If you're editing two mutable slices, without this aliasing information then LLVM has to assume they might overlap (like they could if you're using (T*, size_t) in C++), and then it needs to very carefully preserve the exact order of writes and reads, since one of the reads might actually be reading something that was written by an earlier write.

You can see the impact of this by asking nightly rust to compile with and without this information: https://rust.godbolt.org/z/orzWWW4Wr. Ignoring the panicking paths and constants at the end, without the aliasing there's about 50% more assembly. Why? If I'm reading it right, it's because LLVM is being clever and actually emitting two different implementations, with a runtime check for overlap to decide which to use: the slow do-things-in-exactly-the-code-order-one-by-one version for overlapping slices, and a nice fast vectorized version for when things overlap. But thanks to Rust's rules about &mut, when compiled normally it can just go straight to the fast one, and not waste extra space in your binary on an unneeded slow path.

system · November 13, 2022, 3:09pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Is LLVM well-optimised towards C++ compared to our beloved Rust? community	6	326	February 24, 2025
Non-aliasing guarantees of &mut T and rustc optimization help	17	1202	March 11, 2020
Rust 1.54.0 is here! announcements	3	803	October 28, 2021
Not optimized properly? help	9	683	May 30, 2021
How are LLVM-IR passes done in Rust? help	4	1617	February 1, 2022

Possible Rust-specific optimizations

Related topics