For some reason, .take_while(f).take_while(g) can't be as well optimized as .take_while(f && g)

I have 3 versions of the same code:

// A
pub fn digit_offset(input: &str) -> usize {
    input.chars().take_while(char::is_ascii_digit).count()
}

// B
pub fn digit_offset(input: &str) -> usize {
    input.chars().take_while(char::is_ascii).take_while(char::is_ascii_digit).count()
}

// C
pub fn digit_offset(input: &str) -> usize {
    input.chars().take_while(|x| x.is_ascii() && x.is_ascii_digit()).count()
}

The Rust compiler manages to optimize C (produces the same ASM for A and C) but not B.

Why is that?

Both versions look similar in assembly. It's not too surprising that they are not exactly identical, optimization looks for a local optimum not a global one, but I would guess both versions should be similarly efficient.

There is only one loop in both versions of the code -- both only do one pass over the data. Note that iterators are lazy, and take_while doesn't consume the whole iterator all at once. Also all the functions get inlined in both versions. So the code is not as different as you say.

4 Likes

Yeah, I have no idea how I could overlook that. However, the "harder to optimize" argument arising out of the increased complexity still stands, probably (the generated code is longer and contains 3 more branches).

2 Likes

Even though the compiler isn't able to optimize (B), it can optimize this code (produce the same assembly as (A)):

pub fn digit_offset(input: &str) -> usize {
    input.chars().take_while(char::is_ascii_digit).map(char::len_utf8).sum()
}