Rust does have C-strings, but they are not native Rust strings and most of Rust's primitives can't work with them without a potentially-recoding conversion.
Rust strings are not self-delimiting, with a terminal 0x00
. Thus they are defined by both a starting address and a byte count, which is why unsafe pointers
and safe references
to Rust strings require a "fat pointer".
C-strings were designed to encode the very limited character set of common US English. As such they cannot encode most of the languages of the world, nor even full US English which uses diaeresis (e.g., the accented i
in naïve
). Only 36% of the world uses a Roman character set, and much of that use requires accented vowels and consonants that are not ASCII and thus can't be represented by C-string's char
.
Rust strings are UTF-8, which can encode virtually all the written languages of the world. That means that most characters in world languages encode in more than one byte, and that a pointer to a specific byte of a Rust string might be pointing to the inside of a multi-byte Rust char
.