How to iterate over emojis / grapheme clusters?


#1

Hi,

I am trying to replace certain emojis with some text.
I thought it might be a good idea to iterate over each char in a string and match the unicode sequences I want to replace.

But when I try to itreate over a String I always end up with each single codepoint of an emoji, is there a way to get them as a whole?

for ex. 'hello🇸🇹123' to ['h', 'e', 'l', 'l', 'o', '\u{1F1F8}\u{1F1F9}', 't', 'e', 's', 't']

thank you

Ernst


#2

https://github.com/unicode-rs/unicode-segmentation provides iterator over graphemes.


#3

I tried that:

            //for c in UnicodeSegmentation::graphemes(s.as_str(), true).collect::<Vec<&str>>() {
            for c in s.graphemes(false).collect::<Vec<&str>>() {
                println!("_{}_", &c);
            }

but it produces (left) from (right):


#4

ahh looks like that its works with some emojis with multiple codepoints
like flags or :family_woman_woman_girl_girl:
but this for ex. : :merperson:t5: fails

*sigh*


#5

Check unicode version support (changelog mentions 9.0 as the latest, maybe they haven’t updated it to 10 yet) and report a bug if needed.