Fastest way to compare str to char?

I need function that match &str to char as fast as possible.

At first I create base variant:

pub fn f1(input: &str) -> Option<usize> {
    ["H", "M", "L", "J"].into_iter().position(|s| *s == input)
}

rustc with optimisation mode on generates 24 instructions with 4 jumps.

I was able to optimise it to 15 instructions and with 2 jumps:

pub fn f2(input: &str) -> Option<usize> {
    let b = input.as_bytes();
    if b.len() == 1 {
        match b[0] {
           b'H' => Some(0),
           b'M' => Some(1),
           b'L' => Some(2),
           b'J' => Some(3),
           _ => None, 
        }
    } else {
        None
    }
}

See godbolt link .

But may be it is possible to use less Rust code, but generate better assembly?

Minor code change, but this generated the same for me.

// Also the same: `input: &[u8]`
pub fn f2(input: &str) -> Option<usize> {
    let b = input.as_bytes();
    match *b {
        [b'H'] => Some(0),
        [b'M'] => Some(1),
        [b'L'] => Some(2),
        [b'J'] => Some(3),
        _ => None, 
    }
}

I tried a few bit fiddling approaches, but didn't manage to convince LLVM to omit the switch tables or do anything else beneficial.

Nit: You're not finding chars, which are 32-bits.

3 Likes

A solution with a lookup table:

pub fn f3(input: &str) -> Option<usize> {
    static TABLE: [Option<usize>; 6] = [Some(0), None, Some(3), None, Some(2), Some(1)];

    match input.as_bytes() {
        &[b] => {
            let c = (b as usize).wrapping_sub(b'H' as usize);
            if c < TABLE.len() {
                TABLE[c]
            } else {
                None
            }
        }
        _ => None
    }
}

godbolt

3 Likes

I think the trade off is branch prediction vs cache utilization.

it seems the threshold is 2 by experiments with simple examples. when the match has only 2 arms (plus the wildcard), it generates cascaded conditional branches; as long I add the third match arm, it turns into a table lookup.

PS:

I like use byte string literal patterns, both generate the same code nevertheless, just personal preference.

PPS:

for some reason if I use string literal patterns (instead of byte string literals), it generate different code (supprising to me) using conditional branches.

You should almost certainly just use a match, but you can tune the types a bit if you want.

If you return Option<u8> instead, for example, it no longer needs to generate the external table and just uses a bt: https://rust.godbolt.org/z/G8EnvoYEY

Or if you return a custom enum instead of a normal integer, it gets even tighter: https://rust.godbolt.org/z/fv4eo1x1c

f7:
        mov     al, 4
        cmp     rsi, 1
        jne     .LBB0_3
        movzx   ecx, byte ptr [rdi]
        add     cl, -72
        cmp     cl, 5
        ja      .LBB0_3
        shl     cl, 3
        movabs  rax, 1108168868864
        shr     rax, cl
.LBB0_3:
        ret

That should be better than the things using 96-byte lookup tables that would have way more cache impact.

(Basically, you should never need to write a manual lookup table and such, just figure out how to help LLVM do it instead.)

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.