Count number of 'z' in a String

            let mut n = 0;
            for c in s.chars() {
                if c == 'z' {
                    n += 1;
                }
            }
            return n;

This seems really inelegant. Is there a one liner solution to this? (I briefly skimmed the iter API and the &str API).

s.chars().filter(|c| *c == 'z').count()

That's probably the simplest I could come up with. It might have a little bit of overhead due to iterating over characters instead of bytes, but the compiler may optimize this away.

2 Likes

This was not mentioned as a requirement in the original question.

Does this also use only O(1) space? I am not convinced if:

(1) filter builds an intermedaite list, consisting of all the 'z''s
or
(2) filter is lazy and count is smart, so no intermedaite list is built and everything is O(1)

filter is an iterator adapter, so there is no intermediate list allocated. With regard to the internals, I suggest you check out godbolt.org, where you can view the generated assembly (use -O to view it with optimizations).

2 Likes

I was curious, so I tested it against the equivalent using .bytes() and the compiled code is quite different. It looks like LLVM is doing some fancy stuff with packing bytes into 64 bit registers but I can't totally follow it. This does not necessarily mean that using bytes() is always faster than chars(), but when I tried each version with a constant str, only the bytes() version compiled down to a constant value.

Of course this only works because it relies on properties of UTF-8 encoding that the compiler doesn't understand (the fact that b'z' can't occur in a multibyte sequence, so it always represents 'z').

1 Like

Yeah, the compiler is doing something fancy there. I'm not familiar enough with assembly to be able to definitively say what, though.

It loads two bytes at a time, compares them against two bytes "zz" stored in another register with a single SSE compare instruction, then shuffles bytes to convert the results of the two comparisons into two 64-bit booleans stored in a single 128-bit SSE register, which is then added to two running totals in another 128-bit register. The loop is also unrolled a bit.

So it's using SSE to run two parallel usize counts at the same time for even and odd bytes, which are added together at the end.

4 Likes