Noob question: why "&str"?

Rust does have C-strings, but they are not native Rust strings and most of Rust's primitives can't work with them without a potentially-recoding conversion.

Rust strings are not self-delimiting, with a terminal 0x00. Thus they are defined by both a starting address and a byte count, which is why unsafe pointers and safe references to Rust strings require a "fat pointer".

C-strings were designed to encode the very limited character set of common US English. As such they cannot encode most of the languages of the world, nor even full US English which uses diaeresis (e.g., the accented i in naïve). Only 36% of the world uses a Roman character set, and much of that use requires accented vowels and consonants that are not ASCII and thus can't be represented by C-string's char.

Rust strings are UTF-8, which can encode virtually all the written languages of the world. That means that most characters in world languages encode in more than one byte, and that a pointer to a specific byte of a Rust string might be pointing to the inside of a multi-byte Rust char.

2 Likes

It also means that constructing substrings and querying the length become constant time instead of linear time in C.

4 Likes

And on top of that (or underneath that?) creating substrings is zero-copy.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.