In the rust programming language book, in section 4.17, Strings, in the subsection on Slicing, it says:
But note that these are byte offsets, not character offsets. So this will fail at runtime: let dog = "忠犬ハチ公"; let hachi = &dog[0..2];
Just to make sure I’m keeping my facts straight, and so someone can easily point out my specific misunderstandings, I’m falling back on a little formal logic. So, here’s what I already know about Unicode that is true regardless of what programming language is involved:
- All valid utf-8 encoded strings are also sequences of bytes.
- Some Utf-8 encoded strings have multibyte codepoints.
- Some byte sequences taken from utf-8 encoded strings are not aligned with boundaries between codepoints.
- All byte sequences that begin at points not aligned with boundaries between codepoints are not valid utf-8 encoded strings.
- Some byte sequences taken from a valid utf-8 encoded string are not themselves also valid utf-8 encoded strings.
- Some byte sequences are valid utf-8 encoded strings.
If I got those wrong, something about my understanding of utf-8 is wrong…
Now, here’s what I’ve come to understand from this part of the rust book:
- All slices of Strings are byte sequences.
- Some slices of Strings are not valid utf-8 encoded strings.
- All Strings are required to be valid utf-8 encoded strings.
- Some slices of Strings are not Strings
- Assigning a slice of a String to a variable creates a value of type String.
- Some slices of strings cannot be assigned to variables.
- Some concatenations of Strings with slices of strings are invalid Strings.
So, it really almost sounds like slices are treated as bytes up until you make an assignment with it, like
let x = &y[n..m]; at which point it’s expected to represent a valid utf-8 encoding?
I think the understanding I’ve come to from reading this part of the book is rather implausible. It just wouldn’t make any sense for slices to work that way, so I suspect I’m wrong about something. Please tell me where I’m misunderstanding rust.
Thanks, and happy $holiday!