How do you iterate over grapheme clusters of a String in Rust?

I read all about string slices and Strings, vectors of u8's, unicode scalar values / char, and grapheme clusters. I think I understand the theory pretty well now.

I see an easy way to get at the bytes of a string. I see an easy way to get at the char values of a string (unicode scalar values). The documentation for chars() even states "Iteration over grapheme clusters may be what you actually want." -- which is so true. So...

How do you iterate over the graphemes of a string!?!?!?

4 Likes

Perhaps using this crate: https://crates.io/crates/unicode-segmentation

6 Likes

There's no built-in way? O.o

Thanks for pointing me to that crate! If it's the official, or semi-official solution, we might want to link to it from the book and the std docs.

1 Like

Yes; unicode is incredibly complex, and Rust's standard library is small. Anything we put in it must remain as-is until the end of time; that's a huge commitment!

It sort of is and isn't at the same time. Generally, we have a policy of not linking to external crates in the docs, as we don't want to play favorites with the ecosystem. That crate isn't maintained by the Rust project, even though it has quite the all-star roster of developers!

13 Likes

Thank you for the clear explanation!

2 Likes

Thanks for the answer. Makes sense.

As someone really new to the language and fairly new to programming, it might be helpful to have a note in the Rust book that explains why iterating over Grapheme Clusters is not in the standard library and that one should find a crate that handles it.

4 Likes

Maybe the existance of external crates for the feature can be acknowledged without choosing any favourites? There should be explicit phrases to make a user stop digging further in std docs (thinking "It must be in std somewhere!") and start looking elsewhere.

2 Likes

Are the docs allowed to point to the cookbook?

Yes, I have the feeling that it's time for this policy to change; I haven't tried to write up anything yet.

As @derekdreery points out, this has already started to blur slightly.

2 Likes

Where is that policy written? Is it subject to be changed by RFCs?

Nowhere, exactly, just and understanding everyone has had for a long time. In general, all policies are subject to change by RFCs, that's what I was referring to with the "writeup" above, I'd like to write one.

1 Like