I can't `match` into two different iterators

It may not matter here, but note that a char is a unicode scalar value, which is not necessarily what a human would consider a character. When you want the latter, you probably want grapheme clusters.

2 Likes

Huh. I was thinking that sometimes it might be useful to be able to iterate over UTF-8 codepoint slices (ie. what .chars() does but without conversion to char), and this split_inclusive pattern does exactly that, even though it's not very clear at first what's going on :smiley: On the other hand, as others have said, it's likely that most often one would want to iterate over grapheme clusters rather than codepoints.

A semantically clearer, much less roundabout way to express the exact same thing would be to use char_indices().

Yeah, see my earlier post, but it's not trivial to extract the exact codepoint slice because there's no "given the start index of a codepoint, give me the end(+1) index" API currently stable (AFAIK. The unstable ceil_char_boundary does that but you have to remember to call it with start_index+1 or it just returns start_index). Another way would be to do .windows(2) but then you have to remember to handle the last codepoint separately.

Can't you just use char::len_utf8() for that?

1 Like

Very nice! I was unfamiliar with that function.

I looked into char_indices, but the problem is that you can't join it output with &str in the middle of iterator. Char is not &str[0] by any means.

You can collect chars directly to a String. What specifically didn't work with char_indices()?