Rust compiler generate slower code

Here is a Rust code: Compiler Explorer

The value is between 0..7 (because of & 0x07), but the checker check the opcodes[instrptr] and if it is OK, then will be match with the VALUE & 0x07. Is it possible a compiler logic mistake?

And because this, the generated code is too slow, because I can't write 0x07 instead _ (default), and the generated code checks this case with more instructions. The error message is in this case:

match opcodes[instrptr] & 0x07 {
| ^^^^^^^^^^^^^^^^^^^^^^^^ pattern 8_u8..=u8::MAX not covered

Can I save somehow these unnecessary assembly instructions?

No. There's no "logic mistake". Not everything that's obvious to you can be rigorously proved by the compiler.

That's really hard to believe, given that adding unreachable_unchecked() literally removes a single instruction.

2 Likes

Slightly different ASM. This is the same as in @H2CO3's I believe, i.e. you don't need unsafe. (I didn't check rigorously.)

pub fn interpret_example(opcodes: &[u8]) -> i32 {
    let mut instrptr = 0;
    let mut reg: i32 = 0;

    loop {
        match opcodes[instrptr] & 0x07 {
            0x00 => reg += 133,
            0x01 => reg -= 133,
            0x02 => reg += 155,
            0x03 => reg -= 155,
            0x04 => reg += 177,
            0x05 => reg -= 177,
            0x06 => instrptr = reg as usize, // JMP
            0x07 => return reg,                 // 0x07: can't compile (Rust 1.70)
            _ => unreachable!()
        }
        instrptr += 1;
    }
}
2 Likes

Oh sorry, the plus check is not because of the match. This check, which makes this code slower is the opcodes[ x ] range check.

But here is a closed range (with modulo): Compiler Explorer
Here we can see the problem:

.LBB0_1:
    movzx   ecx, cx
    movzx   esi, byte ptr [rdi + rcx]
    and     esi, 7
    cmp     esi, 6                       <- unneccessary 0..6 checking
    ja      .LBB0_3                      <- unneccessary (default case), because it is the case 0x07
    movsxd  rsi, dword ptr [rdx + 4*rsi]
    add     rsi, rdx
    jmp     rsi

Here will be check, the range is between 0..6 (jmp) or default. But we can use the jump table between 0..7 and the check is unneccessary.

Here is a C code: Compiler Explorer and the optimal assembly:

.L2:
    movzx   eax, dx
    lea     rdx, [rax+1]
    movzx   eax, BYTE PTR [rdi+rax]
    and     eax, 7
    jmp     [QWORD PTR .L5[0+rax*8]]
1 Like

I didn't test this at all (for performance nor correctness), but you could try separating out a loop that doesn't need bounds checking.

2 Likes

Thank you for this idea, here is the final code, which needs for me: Compiler Explorer
The generated assembly code seems clear and fast.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.