String
s and str
s in Rust are UTF8 encoded, but indexing is based on byte position (otherwise it would have to scan the entire prefix to count code points, and a count of code points isn't useful as far as the human interpretation of a String
goes -- multiple code points can combine into what a human would consider a single glyph, like an emoji or some accented letters, et cetera).
If you try to index into the middle of the encoding of a single code point, for example if your insert would break up the encoding of a single code point, you'll get a panic. The letter you inserted into index 1 takes two bytes to encode. After that the String
contains:
One letter, multiple bytes
/ \
| H | ş | e | l | l | o | | W | o | r | l | d | ! |
[ 72, 197, 159, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
0 1 2 3 ...
So now attempting to insert at index 2 would panic because that breaks up the encoding of ş
(which would result in invalid UTF8).
Trying to manipulate Unicode text to do things like slicing, dicing, inserting, and so on is complicated. To do it properly you need support beyond that which is built into the language and standard library, e.g. this crate.
Respecting code point boundaries can be done without such a dependency, and will let you avoid panics. But you'll still produce garbage in many cases (e.g. inserting into the middle of a grapheme cluster).
Unicode is a large topic but hopefully this gives you a place to start.