Hi All,
I recently read this post about matching a bool vs using if else. My question is:
is there a performance benefit to either method, or is the choice simply stylistic?
Thanks.
Hi All,
I recently read this post about matching a bool vs using if else. My question is:
is there a performance benefit to either method, or is the choice simply stylistic?
Thanks.
Be sure that if there is, the optimizer will rewrite the less performant as the more performant form. Compilers are very good at these things.
You should assume by default that everything that you can easily rearrange from one form to an equivalent one, the compiler can and will do as well. And much, much more. It’s literally the optimizer’s job to rearrange code to make it as fast as possible. Which starts by rearranging code into a form as easy to optimize as possible. The compiler really could not care less about trivial differences of surface syntax like ifs and matches. It transforms what you wrote into a much more abstract form and then begins the real work.
I can confirm they've been generating the same code over at least the last years, I keep an eye on them for the slightly more specific stackoverflow question of converting a boolean to 0 or 1.
Don't blindly trust the compiler to be smart at very local code generation, though. let x = some_boolean as usize
performs 10-20% worse in a microbenchmark than let x: usize = if some_boolean { 1 } else { 0 }
, or the equivalent match
expression. Although, for the first time in years, since nightly-2022-01-21, the if
/match
variants are now 11-22% slower than before, so the conversions are now fastest.
Don't blindly trust microbenchmarks either. Sigh…
I lied: I didn't check it for any boolean, but for the expression option.is_some()
. And in the context of (not) using the result in test::black_box
. Not sure if godbolt can still show the difference before nightly-2022-01-21, but there still is a difference now (whichever nightly that is right now). The benchmark code is on github. I have not heard anyone confirm the benchmark result on their machine, just checked on another (quite different) Intel CPU.
Hmm... Interesting, the problem seems to in black_box()
: with it my example doesn't produce the same code anymore. black_box()
is implemented with inline assembly as optimization barrier for LLVM: I'm not sure if this is a LLVM bug that should've optimized that despite the inline assembly, or LLVM is doing fine and we just pessimize things too much with this implementation. I don't have time to dig into that now, however I found the following comment:
There's a slight difference in your black_box
example: the if_
code goes i32
, which somehow makes the needlessly generated code look slightly weirder than if it needlessly converts to usize
. In the other function that does get optimized, the type doesn't make a difference (obviously).
To hang on to the original question, my point was that even if all branches can be optimized away, and the whole expression is just a static cast, the if
or match
expressions still aren't the same as other ways to write the same static cast. And they change performance over releases. So can we be really sure if
and match
are always going to be the same?
When I add match
to chrefr's example, llvm collapses it with the if
based function (choosing to keep the shortest name, apparently), again confirming it's the same generated code. The Rust MIR output (that I'm not familiar with) for if
and match
is different from the MIR output for the explicit static cast, as expected, but they are slightly different between themselves too, and I don't know if they're supposed to be.
Hmm, so at this point, it seems it would be better to use a cast in a hot section, although previously if/match would have been faster?
Yes, my bad, but it doesn't make a real difference (by the way, the only reason I added the underscore at the end is that if
is a reserved work, but m
is not ).
example::if_:
push rax
test edi, edi
je .LBB1_1
- mov dword ptr [rsp + 4], 1
+ mov qword ptr [rsp], 1
- lea rax, [rsp + 4]
+ mov rax, rsp
pop rax
ret
.LBB1_1:
- mov dword ptr [rsp], 0
+ mov qword ptr [rsp], 0
mov rax, rsp
pop rax
ret
We can never truly rely on the optimizer. But we can be almost certain it will work. Like already said, compilers are very good at these things. And so it's better to account for readability than possible performance gain (not saying that in this case specifically the if
is more readable, I'm not sure about that).
When was if
/match
faster?
Anyway, no, it's better to write the more readable form. If it proves to be a perf bottleneck., and cast is better, you may use it - though I would also probably fill a bug report, because it really should have the same assembly.
Please do! LLVM is on github now, so it's much easier to file bugs than it used to be.
Here's one (`switch` not recognized as equivalent to `sext` · Issue #54561 · llvm/llvm-project · GitHub) I filed a few days ago after looking into std::cmp::Ordering as {integer} - #5 by jbe -- the problem turned out to repro in C++ too (https://clang.godbolt.org/z/57jqWxbxe) so it turned out to be something missing in LLVM, not specifically a rust problem.
Well, yes, using nightly-2022-01-21 or beyond, but now the difference is extremely small (might well be different either way on a different CPU), for that specific case (getting the number of elements in an Option<&X>
, maybe any null-pointer test), and most of all, if it's not fake news spread by std::hint::black_box
or micro-benchmarking.
Indeed, like issue #74615 is still open (or maybe that's not the optimizer but rustc - in any case, sometimes unwrap
seemed faster than unwrap_unchecked
). So I want to have a look at what the optimizer sees, and that appears to be LLVM IR. It's hardly readable, Godbolt won't show it, but Playground does, or you can brew your own.
When comparing the if-based function to the match-based function (switch "Build" to "LLVM IR" to see), the LLVM IR looks the same but is labelled differently. The Rust MIR output of Godbolt suggests why: the if-based expression has some boolean copy of the condition, almost as if the if-condition in Rust were not a boolean but some kind of truthiness. If you're paranoid, you could prefer match
over if
to avoid any mention of that boolean copy. But, to be clear, the boolean copy is just local to Rust's internals, it's not passed on to LLVM apart from changed register numbers (or whatever they're called).
Anyway, this is a labour-intensive way to assess that the if
or match
expression evaluate to the same hard-to-read LLVM input, so not recommended.
Still, while we're at it, back to the bool-to-int topic I sneaked into here: let's compare the LLVM IR generated by nightly-2022-01-20 and nightly-2022-01-21. There's no difference, barring ids, for all functions featured here, also when applying on an Option<&'static char>
argument instead of a bool
. While cargo bench
still testifies that's when if
/match
got slapped in the face. Riddle 1, but there's still hope: I didn't manage to compare the LLVM IR for the actual benchmark code.
Let's compare with the LLVM IR of the explicit bool-to-usize conversion: here, what LLVM gets is a separate function (core::convert::num::<impl core::convert::From<bool> for usize>::from
) with inlinehint
that apparently the optimizer inlines to a simple "stretch to 64 bits". While the optimizer does not understand the equivalent of if b { 1 } else { 0 }
is the same "stretch to 64 bits". But it never does that, regardless of black_box: it clearly generates branches with explicit 1 and 0 constants, just the same code as when you convert to two other constants. Also if we go back to stable Rust 1.56, while even 1.58 predates nightly-2022-01-21. Riddle 2: both if
and match
expressions for bool-to-int conversion are never optimized and you'd think they should have been slower.
Pass --emit=llvm-ir
as an argument (you may want to disable optimizations to see the IR rustc emits and not the optimized IR).
LLVM-IR can actually be far more readable than ASM in some cases, and it has pretty good docs at LLVM Language Reference Manual — LLVM 18.0.0git documentation if there's anything you don't recognize.
Godbolt will absolutely show it: https://rust.godbolt.org/z/a19xnb475
I've also included the MIR in that demo, where you can see that even inside rustc the if
vs the match
end up being the same CFG, even before we ask LLVM to optimize stuff.
And you'll notice how it looks like there's only one function in the output there? That's because they're so identical that LLVM noticed, and deduped them.
(Aside: there's no need for black_box
tricks when you're looking at the code for a pure method. You don't need to keep the compiler from optimizing stuff out when it's being returned.)
Something I don't think I've seen mentioned here. Doesn't rustc convert if
/else
into match
as part of desugaring?
Thing is, I'm not sure if this is a LLVM bug or Rust one. From early experimentation I suspect multiple bugs are involved, but I'll need to have more time to confirm.
This is the basic LLVM IR generated:
define i32 @black_box(i32 %dummy) {
%1 = alloca i32, align 4
store i32 %dummy, i32* %1, align 4
call void asm sideeffect "", "r,~{memory}"(i32* %1)
%2 = load i32, i32* %1, align 4
ret i32 %2
}
define void @if(i1 zeroext %cond) {
%cond_as_i32_ptr = alloca i32, align 4
br i1 %cond, label %cond_is_true, label %cond_is_false
cond_is_true:
store i32 1, i32* %cond_as_i32_ptr, align 4
br label %call_black_box
cond_is_false:
store i32 0, i32* %cond_as_i32_ptr, align 4
br label %call_black_box
call_black_box:
%cond_as_i32 = load i32, i32* %cond_as_i32_ptr, align 4
call i32 @black_box(i32 %cond_as_i32)
ret void
}
define void @if(i1 zeroext %cond) local_unnamed_addr {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
br i1 %cond, label %cond_is_true.split, label %cond_is_false.split
cond_is_true.split: ; preds = %0
%3 = bitcast i32* %2 to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %3)
store i32 1, i32* %2, align 4
call void asm sideeffect "", "r,~{memory}"(i32* nonnull %2) #1
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %3)
br label %call_black_box
cond_is_false.split: ; preds = %0
%4 = bitcast i32* %1 to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %4)
store i32 0, i32* %1, align 4
call void asm sideeffect "", "r,~{memory}"(i32* nonnull %1) #1
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %4)
br label %call_black_box
call_black_box: ; preds = %cond_is_false.split, %cond_is_true.split
ret void
}
Then for some reason it doesn't continue optimizing the alloca
. It is the asm prevents it from doing so, but I'm not sure whether this is a bug or the intended behavior.
If I manually convert it:
define void @if(i1 zeroext %cond) {
br i1 %cond, label %cond_is_true, label %cond_is_false
cond_is_true:
br label %call_black_box
cond_is_false:
br label %call_black_box
call_black_box:
%cond_as_i32 = phi i32 [ 0, %cond_is_false ], [ 1, %cond_is_true ]
call i32 @black_box(i32 %cond_as_i32)
ret void
}
Then the optimizer inserts lifetime annotations and it prevents the code from being optimized well:
define void @if(i1 zeroext %cond) local_unnamed_addr {
%0 = alloca i32, align 4
%. = zext i1 %cond to i32
%1 = bitcast i32* %0 to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %1)
store i32 %., i32* %0, align 4
call void asm sideeffect "", "r,~{memory}"(i32* nonnull %0) #1
call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %1)
ret void
}
If I remove them:
define void @if(i1 zeroext %cond) local_unnamed_addr {
%1 = alloca i32, align 4
%. = zext i1 %cond to i32
store i32 %., i32* %1, align 4
call void asm sideeffect "", "r,~{memory}"(i32* nonnull %1) nounwind
ret void
}
Then it does get optimized:
if: # @if
mov dword ptr [rsp - 4], edi
lea rax, [rsp - 4]
ret
So I think there are mutliple bugs here:
alloc
from being collapsed into a phi
.black_box()
, because before that it's always valid to optimize the alloca
.Also, I'm not sure rustc really should emit the ~{memory}
constraint, though removing it doesn't seem to change things.
Doesn't rustc convert
if
/else
intomatch
as part of desugaring?
I think that's semantically true, but not how it's actually implemented.
It looks like there still an If
in all of
Right, but SwitchInt
isn't really match
either, since it doesn't do patterns. That's why I called MIR a CFG earlier.
match
is also lowered to SwitchInt
(s). I wouldn't just call it "desugaring" when the code to do it says
- //! Code related to match expressions. These are sufficiently complex to
- //! warrant their own module and submodules. :) This main module includes the
- //! high-level algorithm, the submodules contain the details.