Intuition for optimizations

When optimizations are discussed it's not uncommon that someone will bring up that optimizations are not guaranteed.

Is it possible to develop some useful intuition for when certain optimizations are going to become less likely to be applied?

To my ignorant mind, things like inlining a lot, deeply nested function calls, with lots of for loops is probably going to exhaust some kind of "optimization context stack" at some point. But I have zero clue what I'm talking about.

The way "it's not guaranteed" is invoked sometimes, I get the impression that it's almost determined by the roll of a dice.

Could one construct a list of hand-wavy "How to be kind to your compiler" bullet points that are generally useful, or is it actually far more black magic than I think it is?

Measure it

Make changes

Measure it again

Make changes

Measure it more

Check you're measuring the right thing

Measure it even more

Keep measuring

If measuring is slow then make measuring faster

Keep measuring

2 Likes

Optimization is in theory a search through a high-dimensional space of possibilities — starting with “should I inline foo()?” × “should I inline bar()?” × “should I inline baz()?” and moving on to “should I rewrite the code in this way or that way?”. An optimizer must use heuristics to choose what is worth trying (and may or may not perform an actual search), because doing a more complete search would take unreasonable amounts of time. A “randomly” missed optimization is where the heuristics estimated that it wasn’t worth looking far enough in that direction.

(The idea of “don’t stop, keep looking until you have exhausted all the possibilities” may be found in the superoptimizer, but that is only feasible for small pieces of code.)

3 Likes

Hm.. That's interesting -- so the "roll of a dice" thing, while not literally true, isn't completely untrue either, it's a question of where it happens to cut off the search space.

And I assume the intuition would basically come from understanding/knowing what kind of optimizations it is more likely to pursue first.

(this is not specific to Rust)

Modern compilers could do almost any optimization; the bigger issue is in some reasonable time to figure out whether the optimization is valid and (almost always) benefitial.

Propagation of constants and dead code elimination are always benefitial, so you can rely on them.

Inlining of functions is often also very simple. It is very difficult to figure out what would be optimal to inline, so inliners use some simple heuristic such as "if we inline foo into bar, how much larger will bar become? Taking the best candidates with some budget".
In this heuristic, if inlining foo into bar will make bar smaller, it will always be inlined - this is generally true for simple functions that access a field, or wrap a call to single function.
If your function is otherwise small, it is still very likely to be inlined, but it is no longer guaranteed - and without inlining, compilers are much worse at reasoning across function boundaries.

Loops are a lot more complicated.

For simple example of two slices with the same length:

for i in 0..a.len() {
  b[i] = a[i];
}

Naively in each iteration it will check whether i < b.len() and i < a.len(). The a check is trivially known from the loop limits, but what about b? The compiler will either have to leave the check there, or split it into something like this:

if a.len() <= b.len() {
  for i in 0..a.len() {
    *b.get_unchecked_mut(i) = *a.get_unchecked(i);
  }
} else {
  for i in 0..a.len() {
    assert!(i < b.len());
    *b.get_unchecked_mut(i) = *a.get_unchecked(i);
  }
}

In this simple case the compiler does split it, but with more complicated loops it may just keep the bounds checks with heavy performance penalties.
In either case it will create much better code if you add assert_eq!(b.len(), a.len()) before the loop.

So for:

try to think what invariants would you need to optimize the code, and add them as asserts if they are not obvious from something else.
Especially at start of functions this is not only good for the compiler, but also benefitial as documentation.

If you have some hot loop, try to use less abstractions, they may sometimes confuse the compiler (resulting in seemingly "roll of a dice").
But for any non-trivial changes, meassure the difference, idealy try to figure out when it breaks.

In general, create code that is easy to reason about (for strict compilers, and humans).

3 Likes

It's worse, actually. As in: none of optimizations can be guaranteed. As usual we are cursed: even simplest, most basic, questions like “may this piece of code be ever executed” can not be answered, in general.

That means that we only have two possibilities: allow false negatives or allow false positives.

Allowing false positives and removing code that may actually be executed… is very bad, people become really angry when that happens — even if language specification clearly says it's not possible, in the absence of UB… they are still angry when the program doesn't work.

That means we are stuck with false negatives: sometimes code can not be actually executed, but compiler “couldn't prove that” and that means dead code remains.

And that means that we couldn't have our cake and eat it, too: if we want to have a compiler that doesn't destroy valid programs then we have to accept that sometimes optimizations wouldn't happen… it's as simple as that. Sad, yet unavoidable.

1 Like

cargo remark is a cool addon that tells LLVM to emit "remarks", reasons why it couldn’t, or chose not to, perform certain optimizations. This included any non-inlined callsites. The addon then makes a nice HTML report from the remarks.

3 Likes

I suggest learning about how they work internally, which can help develop that intuition. Some good talks:


At the more abstract level, the ones you can rely on the most are the ones that are the easiest for the compiler to prove correct.

For example, you never need to think about x + x vs x << 1 or x / 2 vs x >> 1 because the compiler will do that 100% reliably all the time: there's nothing to prove about it, so it needs no context. If it's not happening -- like signed x / 2 -- it's because it's not actually the same.

Similarly, why does reslicing work so well? Because in a loop of for i in 0..n, it ends up being while i < n, and when you earlier did let a = &a[..n];, a[i]'s bounds check is also i < n`. It's exactly the same condition with the same variables, so it's super-easy to know that it's fine. LLVM is smart so often more complex conditions can work too, but chains of inequalities are of course harder to follow than "look, these slices have exactly the same length as the iteration bounds".

So the best way to make optimizations more reliable is to make them more obvious to the compiler.

4 Likes