If the range contains no invalid unicode codepoint, then I suppose you can just iterate over u32 and use char::from_u32 ? If it does contain invalid codepoint then I have no idea.
No that's the issue, multi-codepoint grapheme clusters e.g. ë and 🙂 have to be supported too, and they can differ in length. So simply incrementing a counter won't work.
It's not so much that it's incorrect as that the example was perhaps a poor one, in retrospect.
I think a better example might be useful here. Consider the following ranges, defined using hex notation:
[#x20-#xD7FF]
[#xE000-#xFFFD]
[#x10000-#x10FFFF]
For each of these 3 ranges, I'd like each grapheme cluster ("character") contained within.
Generally the non-ASCII ranges will use this notation, so it's important to support it.
How can I accomplish this?
Those look like ranges of scalar values (a subset of code points). Grapheme clusters can consist of multiple code points.
The encoding of a scalar value is variable with in UTF8, but can be fixed width in other encodings. Rust chars are unicode scalar values, represented directly as their value in 32 bit form, and is a fixed width encoding of scalar values.
Grapheme clusters are variable width in any encoding, as you can combine arbitrarily many (combining) code points.
It's unclear what practical goal you are trying to accomplish (XY problem), but there's a decent chance that iterating over scalar values is not it. But, you can do that; it's what the char..char examples do. The values you get out will tend to be somewhat related to their neighbors, but there's no guarantee they carry any inherent meaning. The values in sequence won't define any meaningful grapheme clusters except by coincidence, for example.
Unicode is complicated and there may not be a straight-forward way to do your practical goal in a way that works for all languages / scripts.