Possible Rust-specific optimizations

I've read that a benefit of HIR and MIR are that they could facilitate Rust-specific optimizations. I also read that this is not currently done -- that all optimizations are currently left to LLVM.

Are there any known Rust-specific optimizations that could be done, but no one has gotten around to implementing?

Also, when browsing through the LLVM code generation reference, it's apparent that there's quite a bit of focus on C/C++. I realize that this doesn't necessarily mean that the LLVM inputs and outputs are specifically tailored to only make C/C++ faster, however I am curious of there are any known optimizations that LLVM could do (but currently isn't) that would benefit Rust specifically?

(Just asking out of curiosity. These are unsolicited questions my brain decided to bring into existence while playing around on godbolt).

1 Like

This at least used to be the case, sort of. LLVM has a noalias attribute that is widely applicable to Rust thanks to &mut being guaranteed by the compiler to not alias, and it helps LLVM optimise Rust programs by using this fact. I'm not an expert so I couldn't tell you how exactly it helps though.

The attribute is not used that much in C and C++ (due to being hard to enforce I suppose) and as a result it had several bugs in it that the Rust compiler kept hitting. It became a bit of a running theme that Rust would hit a noalias bug, disable it until the bug was fixed, re-enable it, and then hit another bug not long after. I think it's been on for over a year now so this period is hopefully over.

As for optimisations on the Rust side and whether there are other things like noalias that LLVM could implement in the future, I'm curious myself so hopefully someone more knowledgeable will be able to answer those. :sweat_smile:

3 Likes

It allows the compiler to eliminate loads when an immutable place is known not to alias with any mutable ones:

fn print_and_increment(x: &i32, y: &mut i32) {
    println!("x = {}", x);
    *y += 1;
    println!("x = {}", x);
}

Since x and y are known not to alias, the compiler can assume *x doesn't change when something writes through y, so it can cache *x in a register instead of reloading it from main memory upon printing it for the second time.

7 Likes

This statement is out of date. See rust/compiler/rustc_mir_transform/src at master · rust-lang/rust · GitHub for a bunch of optimizations that rustc can do. (Not all of them are turned on, and some are non-optimization transformations, but many of them are optimizations that are on by default.)

As a particularly exciting and timely one, 1.64 will Enable MIR inlining by cjgillot · Pull Request #91743 · rust-lang/rust · GitHub. That allows doing inlining on the generic version of a function, which can be done once instead of needing LLVM to do it on every monomorphization of the function. And since that's in the middle of the compiler, it can help all backends -- notably the cranelift backend, which doesn't itself do inlining, got way faster thanks to it.

9 Likes

For another example of this, see MIRI says `reverse` is UB, so replace it with something LLVM can vectorize by scottmcm · Pull Request #90821 · rust-lang/rust · GitHub

If you're editing two mutable slices, without this aliasing information then LLVM has to assume they might overlap (like they could if you're using (T*, size_t) in C++), and then it needs to very carefully preserve the exact order of writes and reads, since one of the reads might actually be reading something that was written by an earlier write.

You can see the impact of this by asking nightly rust to compile with and without this information: https://rust.godbolt.org/z/orzWWW4Wr. Ignoring the panicking paths and constants at the end, without the aliasing there's about 50% more assembly. Why? If I'm reading it right, it's because LLVM is being clever and actually emitting two different implementations, with a runtime check for overlap to decide which to use: the slow do-things-in-exactly-the-code-order-one-by-one version for overlapping slices, and a nice fast vectorized version for when things overlap. But thanks to Rust's rules about &mut, when compiled normally it can just go straight to the fast one, and not waste extra space in your binary on an unneeded slow path.

9 Likes