A weird thing you would never do: conversion from &char to &str

As I understand, this conversion is valid:

fn convert(ch: &char) -> &str {
    unsafe {
        let bytes = slice::from_raw_parts(ch as *const _ as *const u8, ch.len_utf8());
        str::from_utf8_unchecked(bytes)
    }
}

let c = 'a';
assert_eq!(convert(&c), "a");

And it actually does work.

My question: is memory representation of char guaranteed to be compatible with &str for c.len_utf8() number of bytes.

No, the memory representation is different in general: a &str contains UTF-8 encoded bytes, but a char is a Unicode codepoint. I believe this basically means that a char is a 31-bit integer.

ASCII characters happen to have a single-byte UTF-8 encoding — they're encoded as the byte with their value. So the above works for ASCII characters, but it breaks for other characters which have a variable length encoding in UTF-8.

1 Like

Doesn't even work for ASCII characters depending on endianness. It working in the OP depends on 'a' being represented on a byte level by 97 0 0 0 and not 0 0 0 97.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.