Different assembly generated for match and if

pub fn is_reserved_char_1(c: char) -> bool {
    match c {
        'a' | 'e' | 'i' | 'o' | 'u' => true,
        _ => false
    }
}

pub fn is_reserved_char_2(c: char) -> bool {
    if c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u' {
        true
    } else {
        false
    }
}

Why is this the case?

2 Likes

Probably because the match gets lowered to a multi-way branch by rustc in the first place, which LLVM has special optimizations for (in this case, it creates a bit mask).

On the other hand, the || operator likely gets lowered to branches, which LLVM fails to remove completely.

1 Like

If you add --emit=llvm-ir to the compiler arguments you'll see that the emitted llvm ir is different for the 2 functions: in the first one there's a switch, in the second one there's a chain of ifs. LLVM just doesn't optimize both of them the same way. It may be able to, but that would increase the build time and probably it was decided it's not worth it.

1 Like

For comparison, there’s always also the alternative to use non-short-circuiting | instead of ||.

    if (c == 'a') | (c == 'e') | (c == 'i') | (c == 'o') | (c == 'u') {
        true
    } else {
        false
    }

seems to compile down to identical code as the match version.


This one compiles to something very similar as well

pub fn is_reserved_char_4(c: char) -> bool {
    if ['a', 'e', 'i', 'o', 'u'].contains(&c) {
        true
    } else {
        false
    }
}

—but apparently this (very similar looking code) is not a good idea:

pub fn is_reserved_char_5(c: char) -> bool {
    if "aeiou".contains(c) {
        true
    } else {
        false
    }
}
8 Likes

For the second version it's clearer to just write:

pub fn is_reserved_char_2(c: char) -> bool {
    c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u'
}

Which generated code is better depends somewhat on the expected distribution of inputs.

But I think the following bitmasking is better than what the compiler produced above because this code is totally branchless:

pub fn is_reserved_char_6(c: char) -> bool {
    let bitmask: u32 = ['a', 'e', 'i', 'o', 'u']
        .iter()
        .map(|c| 1u32 << (*c as u32 - 'a' as u32))
        .sum();

    let x = (c as u32).wrapping_sub('a' as u32);
    (x < 32) & (bitmask & (1u32 << (x & 31)) != 0)
}
4 Likes

If you are willing to do some of LLVM's work manually, Rust's zero-cost abstractions shine right through the optimizer, and you can shave off even that single branch instruction: Compiler Explorer. The following Rust code:

use std::ops::BitOr;

fn bit_mask_for_chars(chars: &[u8]) -> (u64, u32, u32) {
    let min = chars.iter().copied().min().unwrap();
    let max = chars.iter().copied().max().unwrap();
    let mask = chars.iter().map(|&c| 1_u64 << (c - min)).fold(0, BitOr::bitor);
    (mask, min.into(), max.into())
}

fn is_one_of(c: char, chars: &[u8]) -> bool {
    let c: u32 = c.into();
    let (mask, min, max) = bit_mask_for_chars(chars);
    let in_range = (min..=max).contains(&c);
    let matches = mask & 1 << (c - min);
    (matches != 0) & in_range
}

pub fn is_reserved_char_6(c: char) -> bool {
    is_one_of(c, b"aeiou")
}

Compiles down to the following assembly:

example::is_reserved_char_6:
        lea     eax, [rdi - 97]
        cmp     eax, 21
        setb    cl
        add     dil, 31
        movzx   eax, dil
        mov     edx, 1065233
        bt      rdx, rax
        setb    al
        and     al, cl
        ret
3 Likes

I'd add assert!(max - min < 64); for the safety. It doesn't change the assembly anyway.

Well, a proper solution shouldn't assert!() anyway but fall back to a slow path instead. Or, as an extension, I could imagine breaking the bit masks up into chunks of 64 bits each and iterating over them.

However, based on a quick Criterion benchmark, it seems like the branchless version is actually ~50% slower than the branchful code LLVM generates. Oh well, apparently the LLVM guys know what they are doing.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.