Matching vs. If Else

Hi All,

I recently read this post about matching a bool vs using if else. My question is:
is there a performance benefit to either method, or is the choice simply stylistic?

Thanks.

1 Like

Be sure that if there is, the optimizer will rewrite the less performant as the more performant form. Compilers are very good at these things.

2 Likes

And Godbolt confirms that.

They have the same assembly even without optimizations. So, no, no difference.

7 Likes

You should assume by default that everything that you can easily rearrange from one form to an equivalent one, the compiler can and will do as well. And much, much more. It’s literally the optimizer’s job to rearrange code to make it as fast as possible. Which starts by rearranging code into a form as easy to optimize as possible. The compiler really could not care less about trivial differences of surface syntax like ifs and matches. It transforms what you wrote into a much more abstract form and then begins the real work.

11 Likes

I can confirm they've been generating the same code over at least the last years, I keep an eye on them for the slightly more specific stackoverflow question of converting a boolean to 0 or 1.

Don't blindly trust the compiler to be smart at very local code generation, though. let x = some_boolean as usize performs 10-20% worse in a microbenchmark than let x: usize = if some_boolean { 1 } else { 0 }, or the equivalent match expression. Although, for the first time in years, since nightly-2022-01-21, the if/match variants are now 11-22% slower than before, so the conversions are now fastest.

Don't blindly trust microbenchmarks either. Sigh…

2 Likes

No it doesn't.

1 Like

I lied: I didn't check it for any boolean, but for the expression option.is_some(). And in the context of (not) using the result in test::black_box. Not sure if godbolt can still show the difference before nightly-2022-01-21, but there still is a difference now (whichever nightly that is right now). The benchmark code is on github. I have not heard anyone confirm the benchmark result on their machine, just checked on another (quite different) Intel CPU.

Hmm... Interesting, the problem seems to in black_box(): with it my example doesn't produce the same code anymore. black_box() is implemented with inline assembly as optimization barrier for LLVM: I'm not sure if this is a LLVM bug that should've optimized that despite the inline assembly, or LLVM is doing fine and we just pessimize things too much with this implementation. I don't have time to dig into that now, however I found the following comment:

2 Likes

There's a slight difference in your black_box example: the if_ code goes i32, which somehow makes the needlessly generated code look slightly weirder than if it needlessly converts to usize. In the other function that does get optimized, the type doesn't make a difference (obviously).

To hang on to the original question, my point was that even if all branches can be optimized away, and the whole expression is just a static cast, the if or match expressions still aren't the same as other ways to write the same static cast. And they change performance over releases. So can we be really sure if and match are always going to be the same?

When I add match to chrefr's example, llvm collapses it with the if based function (choosing to keep the shortest name, apparently), again confirming it's the same generated code. The Rust MIR output (that I'm not familiar with) for if and match is different from the MIR output for the explicit static cast, as expected, but they are slightly different between themselves too, and I don't know if they're supposed to be.

1 Like

Hmm, so at this point, it seems it would be better to use a cast in a hot section, although previously if/match would have been faster?

Yes, my bad, but it doesn't make a real difference (by the way, the only reason I added the underscore at the end is that if is a reserved work, but m is not :slight_smile:).

example::if_:
        push    rax
        test    edi, edi
        je      .LBB1_1
-       mov     dword ptr [rsp + 4], 1
+       mov     qword ptr [rsp], 1
-       lea     rax, [rsp + 4]
+       mov     rax, rsp
        pop     rax
        ret
.LBB1_1:
-       mov     dword ptr [rsp], 0
+       mov     qword ptr [rsp], 0
        mov     rax, rsp
        pop     rax
        ret

We can never truly rely on the optimizer. But we can be almost certain it will work. Like already said, compilers are very good at these things. And so it's better to account for readability than possible performance gain (not saying that in this case specifically the if is more readable, I'm not sure about that).

When was if/match faster?

Anyway, no, it's better to write the more readable form. If it proves to be a perf bottleneck., and cast is better, you may use it - though I would also probably fill a bug report, because it really should have the same assembly.

Please do! LLVM is on github now, so it's much easier to file bugs than it used to be.

Here's one (`switch` not recognized as equivalent to `sext` · Issue #54561 · llvm/llvm-project · GitHub) I filed a few days ago after looking into std::cmp::Ordering as {integer} - #5 by jbe -- the problem turned out to repro in C++ too (https://clang.godbolt.org/z/57jqWxbxe) so it turned out to be something missing in LLVM, not specifically a rust problem.

Well, yes, using nightly-2022-01-21 or beyond, but now the difference is extremely small (might well be different either way on a different CPU), for that specific case (getting the number of elements in an Option<&X>, maybe any null-pointer test), and most of all, if it's not fake news spread by std::hint::black_box or micro-benchmarking.

Indeed, like issue #74615 is still open (or maybe that's not the optimizer but rustc - in any case, sometimes unwrap seemed faster than unwrap_unchecked). So I want to have a look at what the optimizer sees, and that appears to be LLVM IR. It's hardly readable, Godbolt won't show it, but Playground does, or you can brew your own.

When comparing the if-based function to the match-based function (switch "Build" to "LLVM IR" to see), the LLVM IR looks the same but is labelled differently. The Rust MIR output of Godbolt suggests why: the if-based expression has some boolean copy of the condition, almost as if the if-condition in Rust were not a boolean but some kind of truthiness. If you're paranoid, you could prefer match over if to avoid any mention of that boolean copy. But, to be clear, the boolean copy is just local to Rust's internals, it's not passed on to LLVM apart from changed register numbers (or whatever they're called).

Anyway, this is a labour-intensive way to assess that the if or match expression evaluate to the same hard-to-read LLVM input, so not recommended.

Still, while we're at it, back to the bool-to-int topic I sneaked into here: let's compare the LLVM IR generated by nightly-2022-01-20 and nightly-2022-01-21. There's no difference, barring ids, for all functions featured here, also when applying on an Option<&'static char> argument instead of a bool. While cargo bench still testifies that's when if/match got slapped in the face. Riddle 1, but there's still hope: I didn't manage to compare the LLVM IR for the actual benchmark code.

Let's compare with the LLVM IR of the explicit bool-to-usize conversion: here, what LLVM gets is a separate function (core::convert::num::<impl core::convert::From<bool> for usize>::from) with inlinehint that apparently the optimizer inlines to a simple "stretch to 64 bits". While the optimizer does not understand the equivalent of if b { 1 } else { 0 } is the same "stretch to 64 bits". But it never does that, regardless of black_box: it clearly generates branches with explicit 1 and 0 constants, just the same code as when you convert to two other constants. Also if we go back to stable Rust 1.56, while even 1.58 predates nightly-2022-01-21. Riddle 2: both if and match expressions for bool-to-int conversion are never optimized and you'd think they should have been slower.

Pass --emit=llvm-ir as an argument (you may want to disable optimizations to see the IR rustc emits and not the optimized IR).

1 Like

LLVM-IR can actually be far more readable than ASM in some cases, and it has pretty good docs at LLVM Language Reference Manual — LLVM 18.0.0git documentation if there's anything you don't recognize.

Godbolt will absolutely show it: https://rust.godbolt.org/z/a19xnb475

I've also included the MIR in that demo, where you can see that even inside rustc the if vs the match end up being the same CFG, even before we ask LLVM to optimize stuff.

And you'll notice how it looks like there's only one function in the output there? That's because they're so identical that LLVM noticed, and deduped them.

(Aside: there's no need for black_box tricks when you're looking at the code for a pure method. You don't need to keep the compiler from optimizing stuff out when it's being returned.)

3 Likes

Something I don't think I've seen mentioned here. Doesn't rustc convert if/else into match as part of desugaring?

Thing is, I'm not sure if this is a LLVM bug or Rust one. From early experimentation I suspect multiple bugs are involved, but I'll need to have more time to confirm.

If someone wants to do that, here are my main conclusions:

This is the basic LLVM IR generated:

define i32 @black_box(i32 %dummy) {
  %1 = alloca i32, align 4
  store i32 %dummy, i32* %1, align 4
  call void asm sideeffect "", "r,~{memory}"(i32* %1)
  %2 = load i32, i32* %1, align 4
  ret i32 %2
}

define void @if(i1 zeroext %cond) {
  %cond_as_i32_ptr = alloca i32, align 4
  br i1 %cond, label %cond_is_true, label %cond_is_false

cond_is_true:
  store i32 1, i32* %cond_as_i32_ptr, align 4
  br label %call_black_box

cond_is_false:
  store i32 0, i32* %cond_as_i32_ptr, align 4
  br label %call_black_box

call_black_box:
  %cond_as_i32 = load i32, i32* %cond_as_i32_ptr, align 4
  call i32 @black_box(i32 %cond_as_i32)
  ret void
}

LLVM prefers to duplicate the call_black_box block to both arms than to convert the alloca to SSA values:

define void @if(i1 zeroext %cond) local_unnamed_addr {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  br i1 %cond, label %cond_is_true.split, label %cond_is_false.split

cond_is_true.split:                               ; preds = %0
  %3 = bitcast i32* %2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %3)
  store i32 1, i32* %2, align 4
  call void asm sideeffect "", "r,~{memory}"(i32* nonnull %2) #1
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %3)
  br label %call_black_box

cond_is_false.split:                              ; preds = %0
  %4 = bitcast i32* %1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %4)
  store i32 0, i32* %1, align 4
  call void asm sideeffect "", "r,~{memory}"(i32* nonnull %1) #1
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %4)
  br label %call_black_box

call_black_box:                                   ; preds = %cond_is_false.split, %cond_is_true.split
  ret void
}

Then for some reason it doesn't continue optimizing the alloca. It is the asm prevents it from doing so, but I'm not sure whether this is a bug or the intended behavior.

If I manually convert it:

define void @if(i1 zeroext %cond) {
  br i1 %cond, label %cond_is_true, label %cond_is_false

cond_is_true:
  br label %call_black_box

cond_is_false:
  br label %call_black_box

call_black_box:
  %cond_as_i32 = phi i32 [ 0, %cond_is_false ], [ 1, %cond_is_true ]
  call i32 @black_box(i32 %cond_as_i32)
  ret void
}

Then the optimizer inserts lifetime annotations and it prevents the code from being optimized well:

define void @if(i1 zeroext %cond) local_unnamed_addr {
  %0 = alloca i32, align 4
  %. = zext i1 %cond to i32
  %1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %1)
  store i32 %., i32* %0, align 4
  call void asm sideeffect "", "r,~{memory}"(i32* nonnull %0) #1
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %1)
  ret void
}

If I remove them:

define void @if(i1 zeroext %cond) local_unnamed_addr {
  %1 = alloca i32, align 4
  %. = zext i1 %cond to i32
  store i32 %., i32* %1, align 4
  call void asm sideeffect "", "r,~{memory}"(i32* nonnull %1) nounwind
  ret void
}

Then it does get optimized:

if:                                     # @if
        mov     dword ptr [rsp - 4], edi
        lea     rax, [rsp - 4]
        ret

So I think there are mutliple bugs here:

  • Lifetime annotations should not generate worse code, only better (I guess some pattern matching in LLVM doesn't consider them).
  • Maybe (I'm not sure) the inline asm shouldn't prevent the alloc from being collapsed into a phi.
  • Even if it should, LLVM should have first collapsed it and only then duplicate the exit block and inline black_box(), because before that it's always valid to optimize the alloca.

Also, I'm not sure rustc really should emit the ~{memory} constraint, though removing it doesn't seem to change things.

1 Like

I think that's semantically true, but not how it's actually implemented.

It looks like there still an If in all of

It's THIR, not MIR.There's no if or branch in MIR. ifs are translated into switchInt.

Right, but SwitchInt isn't really match either, since it doesn't do patterns. That's why I called MIR a CFG earlier.

match is also lowered to SwitchInt(s). I wouldn't just call it "desugaring" when the code to do it says