Yes, I was curious about the bulk copy thing but it seems that it does not make up for the extra overhead at this input size after all.
I already noticed in a previous project of mine that char_indices() can be expensive to use, at the time I resolved it by building a custom character iterator which only provides indices on demand (and specializing it for ASCII as well, which is what I knew to be parsing at the time). Maybe something similar could work here, but again, we're entering the territory of custom abstractions that directly work on the string's bytes
@mgeisler Nice coincidence regarding the ligature UTF-8 representation! I wouldn't have expected the stars to align so well, considering that IIRC some ligature encoding already existed before Unicode was released. It's good to know that capacity-tweaking is not needed after all.