How to do Python's str.translate() in rus?

Yes, I was curious about the bulk copy thing but it seems that it does not make up for the extra overhead at this input size after all.

I already noticed in a previous project of mine that char_indices() can be expensive to use, at the time I resolved it by building a custom character iterator which only provides indices on demand (and specializing it for ASCII as well, which is what I knew to be parsing at the time). Maybe something similar could work here, but again, we're entering the territory of custom abstractions that directly work on the string's bytes :slight_smile:

@mgeisler Nice coincidence regarding the ligature UTF-8 representation! I wouldn't have expected the stars to align so well, considering that IIRC some ligature encoding already existed before Unicode was released. It's good to know that capacity-tweaking is not needed after all.

1 Like

Isn’t this essentially the NFKD Unicode normalization form? If so, then crates like unicode_normalization are probably going to be much more complete and correct than any reasonably-sized regex.

3 Likes

However it looks in your font, it is indeed U+FB05 which is the long-s t ligature.

Yes, so I'm not translating that to st, same as the other one.
Thanks.