Which of these codes is better?

Hello,
What is the difference between these codes and which one is better?
Code 1:

fn main() {
        println!("{}", 65 as char);
}

Code 2:

fn main() {
        println!("{}", 65 as u8 as char);
}

Thank you.

Depends. Are you only going to use 65? Is this coming from somewhere else? Without context, it's difficult to explain what the tradeoffs and options are.

The best way to get code point 65 is obviously 'A'.

If you're asking how to get some not-easily typeable codepoint, then you can use '\x41' for codepoints below 128, or '\u{41}' for any valid hex value.

If you're asking how to get some low-valued codepoint in decimal, then sure, 65 as char is fine. 65 as u8 as char is mostly the same, but it does a redundant cast. The only integer type that can be cast to char is u8. In both cases above, 65 is a u8.

(Incidentally, you can use 65u8 to explicitly make an integer literal a u8.)

Now, if you're not using literals, things change. Using x as u8 as char is bad because if x is any larger than u8, you will lose data. x as char only works for x: u8.

In other cases, you'll want to use char::from_u32(x) where x: u32. What if x isn't u32? Then you'll want something like char::from_u32(u32::try_from(x).ok()?)? to perform a checked conversion of x to u32, then a checked conversion of that to char.

9 Likes

Hello,
Thank you so much for your reply.
Is the output type of the two codes different from each other?

For 65, they're all the same. Whether they differ for other values depends on what that value is and which variants you're talking about. Basically: does it fit into a u8 or not; if not, it will fail. Some at compile time, others at run time.

@hack3rcon

For this reason, I’d probably prefer the code example from the book you’re reading to be re-written as something like

for n in 32..127_u8 {
    println!("{}: [{}]", n, n as char);
}
for n in 160..=255_u8 {
    println!("{}: [{}]", n, n as char);
}

avoiding the conversion from i32 to u8 and instead working with unsized 8-bit numbers from the beginning. The usage of ..= is because you cannot write an exclusive upper bound of 256 in the u8 type itself, so you need to work with an inclusive upper bound of 255 instead.

Rust’s “as”-conversions are often a bit discouraged in general, due to the fact that they are often able to (quite implicitly) truncate off some data. Most of the conversions that don’t suffer from this problem are also available via From-conversions, so arguably even nicer might be:

for n in 32..127_u8 {
    println!("{}: [{}]", n, char::from(n));
}
for n in 160..=255_u8 {
    println!("{}: [{}]", n, char::from(n));
}
4 Likes

Given a choice when working with UTF8 or more generally Unicode text (so char, String, str), one would ideally code them and read them in as text from the start, as opposed to having bytes or other integer values that have to be converted. Granted, this isn't always possible.

Being familiar with ASCII (0..128) is still useful wisdom, and maybe that's all the code was getting at. But generally if I've gotten ahold of some bytes and I need to interpret them as text, I'm reaching for from_utf8.

[1]


  1. Beyond that, there are situations as a programmer where you have to deal with different character sets and encodings in more depth, whether that's going between UTF-16 and UTF-8, dealing with non-UTF-8 Unix or non-UTF-16 Windows paths/values, or dealing with legacy non-Unicode encodings. However, outside of special cases like ASCII (which is a subset of UTF-8), dealing with such encodings gracefully is a gnarly, in-depth business. I'm fairly certain that's not what the source of the code was trying to demonstrate. ↩︎