Rust compiler generate slower code

Zso · July 2, 2023, 6:38pm

The value is between 0..7 (because of & 0x07), but the checker check the opcodes[instrptr] and if it is OK, then will be match with the VALUE & 0x07. Is it possible a compiler logic mistake?

And because this, the generated code is too slow, because I can't write 0x07 instead _ (default), and the generated code checks this case with more instructions. The error message is in this case:

match opcodes[instrptr] & 0x07 {
| ^^^^^^^^^^^^^^^^^^^^^^^^ pattern 8_u8..=u8::MAX not covered

Can I save somehow these unnecessary assembly instructions?

H2CO3 · July 2, 2023, 6:49pm

No. There's no "logic mistake". Not everything that's obvious to you can be rigorously proved by the compiler.

That's really hard to believe, given that adding unreachable_unchecked() literally removes a single instruction.

quinedot · July 2, 2023, 6:53pm

Slightly different ASM. This is the same as in @H2CO3's I believe, i.e. you don't need unsafe. (I didn't check rigorously.)

pub fn interpret_example(opcodes: &[u8]) -> i32 {
    let mut instrptr = 0;
    let mut reg: i32 = 0;

    loop {
        match opcodes[instrptr] & 0x07 {
            0x00 => reg += 133,
            0x01 => reg -= 133,
            0x02 => reg += 155,
            0x03 => reg -= 155,
            0x04 => reg += 177,
            0x05 => reg -= 177,
            0x06 => instrptr = reg as usize, // JMP
            0x07 => return reg,                 // 0x07: can't compile (Rust 1.70)
            _ => unreachable!()
        }
        instrptr += 1;
    }
}

Zso · July 2, 2023, 7:28pm

Oh sorry, the plus check is not because of the match. This check, which makes this code slower is the opcodes[ x ] range check.

But here is a closed range (with modulo): Compiler Explorer
Here we can see the problem:

.LBB0_1:
    movzx   ecx, cx
    movzx   esi, byte ptr [rdi + rcx]
    and     esi, 7
    cmp     esi, 6                       <- unneccessary 0..6 checking
    ja      .LBB0_3                      <- unneccessary (default case), because it is the case 0x07
    movsxd  rsi, dword ptr [rdx + 4*rsi]
    add     rsi, rdx
    jmp     rsi

Here will be check, the range is between 0..6 (jmp) or default. But we can use the jump table between 0..7 and the check is unneccessary.

Here is a C code: Compiler Explorer and the optimal assembly:

.L2:
    movzx   eax, dx
    lea     rdx, [rax+1]
    movzx   eax, BYTE PTR [rdi+rax]
    and     eax, 7
    jmp     [QWORD PTR .L5[0+rax*8]]

quinedot · July 2, 2023, 8:36pm

I didn't test this at all (for performance nor correctness), but you could try separating out a loop that doesn't need bounds checking.

Zso · July 4, 2023, 7:13am

Thank you for this idea, here is the final code, which needs for me: Compiler Explorer
The generated assembly code seems clear and fast.

system · October 2, 2023, 7:14am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Generated assembly code of loops	8	933	October 7, 2020
Suboptimal code generation for adding arbitrary-precision numbers help	7	639	January 12, 2023
How to speed up this rust code? I'm measuring a 30% slowdown versus the C++ version help	25	10232	January 12, 2023
Out od curiosity, why Rust insisted on the case?	4	262	July 24, 2025
Manual vs compiler bounds checking codegen	6	487	August 13, 2022

Rust compiler generate slower code

Related topics