How do you iterate over grapheme clusters of a String in Rust?

CleanCut · June 14, 2017, 4:30am

I read all about string slices and Strings, vectors of u8's, unicode scalar values / char, and grapheme clusters. I think I understand the theory pretty well now.

I see an easy way to get at the bytes of a string. I see an easy way to get at the char values of a string (unicode scalar values). The documentation for chars() even states "Iteration over grapheme clusters may be what you actually want." -- which is so true. So...

How do you iterate over the graphemes of a string!?!?!?

cuviper · June 14, 2017, 4:32am

Perhaps using this crate: https://crates.io/crates/unicode-segmentation

CleanCut · June 14, 2017, 5:00am

There's no built-in way? O.o

Thanks for pointing me to that crate! If it's the official, or semi-official solution, we might want to link to it from the book and the std docs.

steveklabnik · June 14, 2017, 3:22pm

Yes; unicode is incredibly complex, and Rust's standard library is small. Anything we put in it must remain as-is until the end of time; that's a huge commitment!

It sort of is and isn't at the same time. Generally, we have a policy of not linking to external crates in the docs, as we don't want to play favorites with the ecosystem. That crate isn't maintained by the Rust project, even though it has quite the all-star roster of developers!

CleanCut · June 14, 2017, 4:49pm

Thank you for the clear explanation!

idanmel · October 8, 2017, 7:36pm

Thanks for the answer. Makes sense.

As someone really new to the language and fairly new to programming, it might be helpful to have a note in the Rust book that explains why iterating over Grapheme Clusters is not in the standard library and that one should find a crate that handles it.

vi0 · January 31, 2018, 3:57pm

Maybe the existance of external crates for the feature can be acknowledged without choosing any favourites? There should be explicit phrases to make a user stop digging further in std docs (thinking "It must be in std somewhere!") and start looking elsewhere.

derekdreery · January 31, 2018, 4:11pm

Are the docs allowed to point to the cookbook?

steveklabnik · January 31, 2018, 4:22pm

Yes, I have the feeling that it's time for this policy to change; I haven't tried to write up anything yet.

As @derekdreery points out, this has already started to blur slightly.

vi0 · January 31, 2018, 5:27pm

Where is that policy written? Is it subject to be changed by RFCs?

steveklabnik · January 31, 2018, 8:53pm

Nowhere, exactly, just and understanding everyone has had for a long time. In general, all policies are subject to change by RFCs, that's what I was referring to with the "writeup" above, I'd like to write one.

Topic		Replies	Views
How to work with strings and graphemes similar to SQL? How to avoid crate proliferation? help	40	1668	May 10, 2021
More efficient conversion from utf8 bytes to a string? help	8	649	July 29, 2022
How to iterate over emojis / grapheme clusters? help	5	1933	January 12, 2023
Iterating over non-ASCII ranges help	9	743	August 8, 2023
String library respecting Unicode	5	618	March 19, 2021

How do you iterate over grapheme clusters of a String in Rust?

Related topics