Differences in an optimization decision between Rustc and Clang

Hi!

Recently I found an interesting difference between Rustc and Clang optimization that can be checked on Godbolt: Compiler Explorer

Rustc for some unknown reason swaps with "322" and "1337" numbers in the generated assembly. Clang doesn't do such a thing and emits instructions in the same order.

What is the Rustc's reason for doing this swap? I tried to compare LLVM IRs emitted by Rustc and Clang - they looks pretty the same to me however I can miss some important details here).

It looks like this changed in Rust 1.72.0. Using rustc 1.71.0, I get the same output as clang 18.1.0.

2 Likes

I can reproduce without Rust involved, using only LLVM IR:

define noundef zeroext i1 @some_top_secret_checker(i32 noundef %var) unnamed_addr {
start:
  switch i32 %var, label %bb6 [
    i32 42, label %bb7
    i32 322, label %bb7
    i32 1337, label %bb7
  ]

bb6:
  br label %bb7

bb7:
  %_0.0 = phi i1 [ false, %bb6 ], [ true, %start ], [ true, %start ], [ true, %start ]
  ret i1 %_0.0
}

causes LLVM to keep the values in order. If I swap bb6 and bb7, I get the wrong result:

define noundef zeroext i1 @some_top_secret_checker(i32 noundef %var) unnamed_addr {
start:
  switch i32 %var, label %bb6 [
    i32 42, label %bb7
    i32 322, label %bb7
    i32 1337, label %bb7
  ]

bb7:
  %_0.0 = phi i1 [ false, %bb6 ], [ true, %start ], [ true, %start ], [ true, %start ]
  ret i1 %_0.0

bb6:
  br label %bb7
}

causes LLVM to swap 322 and 1337.

1 Like

Thanks a lot for the LLVM IR-only MRE! For now, we see that for some reason the basic block order influences the code generation.

Next question - do we need to consider current Rustc behavior as something that should be reported to the upstream? At least for me, the current behavior is a bit surprising.

1 Like

It doesn't appear to affect execution time or correctness, so I'd say it's not a bug. You could use the LLVM IR-only MRE to report it to the LLVM project, but I suspect they'd say the same - it's just a consequence of codegen behaving slightly differently with different IR.

2 Likes

I tend to agree with your conclusion. However, the execution order, in this case, can affect the actual performance since the comparison order is important if we sorted the numbers in the example above with their frequency: the more frequent inputs we moved to the beginning of the function. With such optimization, we can execute fewer comparisons in runtime in an average case. Actually, LLVM already performs such optimization in the PGO case - more frequent branches are moved to the beginning of the function.

Without providing such information (with PGO profiles or via attributes - it doesn't matter), I as a user can expect to preserve the same order as in the source code (as Clang does). From another perspective, I don't think that LLVM nor Rustc people would be happy about providing such guarantees.

Given that the reordering comes after rustc outputs LLVM IR (since I started with the LLVM IR that Rust 1.78 outputted, and changed it to look the same as the Clang LLVM IR), this isn't rustc at all - rustc has put the three values in the switch in the order you supplied them, and LLVM has chosen to optimize differently based on BB order.

This is definitely something to take up with LLVM if you think they're doing it wrong.

1 Like

Yep, I agree. I just mentioned Rustc because Rustc can change the way how LLVM IR is generated, and change the generated LLVM IR to look exactly the same as from Clang. I am not saying that this is the right way, in this case - I agree that this should be discussed with LLVM devs.

It looks like it happens somewhere in a MIR transformation. These optimizations happen before the code is passed on to the compiler backend (LLVM in this case). Compiling with nightly, compare the difference in output between -Zmir-opt-level=0 and -Zmir-opt-level=1: Compiler Explorer

Output (since nightly is always being updated, the results on Godbolt are subject to change)

-C opt-level=3 -Zmir-opt-level=0

example::some_top_secret_checker::h221a3e4ba06d2223:
        mov     al, 1
        cmp     edi, 42
        je      .LBB0_4
        cmp     edi, 322
        je      .LBB0_4
        cmp     edi, 1337
        je      .LBB0_4
        xor     eax, eax
.LBB0_4:
        ret
define noundef zeroext i1 @example::some_top_secret_checker::h221a3e4ba06d2223(i32 noundef %var) unnamed_addr {
start:
  switch i32 %var, label %bb6 [
    i32 42, label %bb7
    i32 322, label %bb7
    i32 1337, label %bb7
  ]

bb6:
  br label %bb7

bb7:
  %_0.sroa.0.0 = phi i1 [ false, %bb6 ], [ true, %start ], [ true, %start ], [ true, %start ]
  ret i1 %_0.sroa.0.0
}


-C opt-level=3 -Zmir-opt-level=1

example::some_top_secret_checker::h221a3e4ba06d2223:
        mov     al, 1
        cmp     edi, 42
        je      .LBB0_4
        cmp     edi, 1337
        je      .LBB0_4
        cmp     edi, 322
        je      .LBB0_4
        xor     eax, eax
.LBB0_4:
        ret
define noundef zeroext i1 @example::some_top_secret_checker::h221a3e4ba06d2223(i32 noundef %var) unnamed_addr {
start:
  switch i32 %var, label %bb6 [
    i32 42, label %bb7
    i32 322, label %bb7
    i32 1337, label %bb7
  ]

bb7:
  %_0.sroa.0.0 = phi i1 [ false, %bb6 ], [ true, %start ], [ true, %start ], [ true, %start ]
  ret i1 %_0.sroa.0.0

bb6:
  br label %bb7
}

Slightly altering the code (for example adding more branches and making it take a reference) had the correct branching order with -Zmir-opt-level=1, but not with -Zmir-opt-level=2.

I agree that maintaining the order of conditional branches is valuable because it allows people to have their expressions return early the majority of the time if they have knowledge of the values it'll see. I think this should be addressed in LLVM, but the frontend is generating weird LLVM IR that causes this behavior, and it shouldn't be too hard to track down which specific MIR transformation is causing it. After that, I have no clue how hard it'd be to fix, but I've looked at a couple of MIR transforms and I can't say I personally understood the logic of anything in there, so...

Edit: I suppose it doesn't necessarily have to be caused by one of the MIR transformations directly; it might also be that this is a bug in the part responsible for lowering MIR into LLVM IR, and it just so happens that the optimized version of the MIR triggers it while the non-optimized does not, meaning it could happen to any code regardless of whether it's been subject MIR transforms.

2 Likes

While this case is slightly weird LLVM IR, it looks from experiment like LLVM will rearrange the order of checks within a switch instructions for any case where the default block for a switch instruction comes after all the target blocks for the individual cases.

As this won't be unusual for switch instructions in other cases, I don't think that trying to contort rustc's generation of LLVM IR to guarantee that the default block comes first is going to be helpful; there will be other cases where doing that results in a different failure to optimize. Instead, LLVM should be maintaining the order (or rustc should be providing suitable metadata to instruct LLVM to keep the order in place - e.g. via branch weights data or similar).

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.