If string slice is slice why
&[ char ] is not definition of its type?
Why that code does not compile?
assert_eq!( is_slice( &[ 1 ,2 ,3 ] ), true );
assert_eq!( is_slice( "abc" ), true );
fn is_slice< T >( _ : &[ T ] ) -> bool
What is so special in string slices?
str is essentially a newtype over
[u8] -- it's UTF-8, very intentionally not
Is that correct that
&[ char ], but because of some reason compiler distinct it?
No. They are two different things.
&str is a slice of bytes (
u8) that use the variable-length UTF-8 encoding (i.e. each character may take up 1, 2, 3, or 4 bytes).
&[char] is a slice of
chars, where each
char is a 4-byte "unicode code point".
Despite what most common programming languages would have you believe, a string isn't just an array of characters. To properly explain the distinction would require going down a very deep rabbit hole and I'll probably butcher it completely. So instead, I'll refer you to this excellent explanation from Tom Scott:
You can't even assume that a single code point correlates to a single glyph (very loosely - the "letter" you would draw) because you've got things like accents and skin tone modifiers which can be added after another code point to alter it.
Thanks for answer. I was not aware
char is 4-bytes long. That's a discovery for me, thanks.
I am aware of "Unicode code points" and logic behind that.
In the core
&[ u8 ], but only reason coercion between two equivalent data types is needed because of Unicode logic, which
str should follow.
Is that correct? Or there is something?
Even if they have identical binary representations, there are a couple reasons why
&[u8] are two different types:
- A bunch of bytes and a string are two different logical concepts, and making them different types means you can use the type system to ensure they don't get accidentally mixed up
- You can give the
str type methods specific to a string (e.g.
trim_whitespace()) and implement the
str type uses
unsafe code internally so if you provide direct access to the underlying bytes, users may accidentally break
str's assumptions and cause UB (e.g. you modified the last byte to look like a multi-byte character and now some
unsafe code will read past the end of the string)
There is similar logic for why
std::path::Path is a different type to
str, even though most languages are happy to treat strings and file paths as the same thing.
Thanks for the detailed explanation, @Michael-F-Bryan
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.