Chars and strings

Hi there.
Official docs tell us that
char is always four bytes in size.
Again, offical docs tell
A string ([ String ] is made of bytes ([ u8 ]
so, what happens in

 let x  = 'c'.to_string();

Thx, cheers.

A String is stored in a format known as utf-8, where the length of a character varies depending on the character. In your case, since the character is in the ascii range, it takes up one byte, but other characters could take up to four bytes.

Thanks Alice, so there is no risk of losing information for non-ASCII characters? I interpreted that "made of bytes" as "a sequence of single bytes" while a char is four bytes long... may be i'm a bit confused, sorry.

No, there is no such risk, a String can store any kind of string data. To illustrate what happens, you can try running this:

fn main() {
    let s = "c".to_string();
    println!("{:?}", s.as_bytes());

    let s = "æ".to_string();
    println!("{:?}", s.as_bytes());
}
[99]
[195, 166]

You may find methods such as is_char_boundary interesting.

You should think of this as a way to make most (english) string data take up four times less space than it would as a char array, while still supporting any kind of character.

1 Like

Yes, very interesting
thx a lot