I compare two Stringchar by char.
There are utf8 string.
In the documentation, for char_indices function on String, we have this example to access the char at some index.
But in fact, it is not the entire char... it is some part of the encoded char.
let yes = "y̆es";
let mut char_indices = yes.char_indices();
assert_eq!(Some((0, 'y')), char_indices.next()); // not (0, 'y̆')
assert_eq!(Some((1, '\u{0306}')), char_indices.next());
// note the 3 here - the previous character took up two bytes
assert_eq!(Some((3, 'e')), char_indices.next());
assert_eq!(Some((4, 's')), char_indices.next());
assert_eq!(None, char_indices.next());
In my code, I try to find the first char that differ between two string and print a part of the string to see the difference [some char before if possible and some char after if possible].
But how to get the entire char ?
I want to get the entire char 'y̆', for exemple knowing just its index...
yeah the issue you are having is actually about graphenes, 1 utf8 graphene like y̆ can be more than one codepoint(also often called char) in this case its y and \u{0306} which is a modifier
Basically: the story there is that many operations that you may want and that are feeling “simple” become very tricky and convoluted with Unicode.
Every single damn thing that you may imagine becomes a pile of tables that you need to consult and apply.
It's also why UTF-8 is proper representation of Unicode string: the advantage that other representation have, direct access to codepoint in a string… have zero applications to real-world algorithm if you want to support flags, emoji and the whole zoo that humanity invented.
Thus arguments against UTF-8 are sounding more of “give us UTF-32 or UTF-16 so it would be easier to write broken and incorrect code”… when phrased like that choice that Rust did becomes obvious: basic operations on code points are in std, but operations that require tables that are tens of megabytes in size (I'm not joking!) don't belong to std, you need to decide what to do about them on case-by-case basis.
The Rust standard library includes tools for working with text data byte-by-byte, or codepoint-by-codepoint. Those can be implemented easily and with minimal cultural data requirements because they're "just numbers." Those approaches work great for dealing with data storage, retrieval, and transmission, and for converting between encodings.
The problem you're trying to solve, of identifying the thing a person reading the text would consider to be a single letter, is not well-supported by the Rust standard library. Frankly, that's probably a reasonable choice: recognizing character divisions requires some extensive and frequently-updated data, which would make it a poor match for Rust's release cadence. Instead, that capability lives in crates like this one.
I would also encourage you to read up on Unicode normalization forms - your test string has at least two distinct representations as sequences of codepoints, and your test results will depend on which representation you use. (They are, respectively, a codepoint representing a y followed by a codepoint representing a combining mark tilde accent, and a single codepoint representing y with a tilde accent.)