Builtin for Vec<u8> to String

https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8 is not what I want.

I want a literal translation of each u8 to a char and then to concatenate it into a string.

        let mut ans = String::new();
        for x in self.data.iter() {
            // x is a &u8
            ans.push(x.as_char());
        }

Is there a builtin for this?

If "literal translation" means you want to treat each u8 as a Unicode code point, then you can do this:

let ans: String = self.data.iter().map(|x| char::from(*x)).collect();

This effectively treats the input as ISO-8859-1 (Latin-1) encoded text, since the Unicode range from 0 to 255 is based on Latin-1. So for example the byte 255 will map to the character 'ÿ'. (If your input data is not Latin-1 encoded, this is probably not what you want!)

Playground

4 Likes

Data is not Latin-1 encoded. This is precisely what I wanted. Thanks!

(Use case: I have a Vec which represents Ascii source. I'm trying to print it for debugging purposes.)

If your source is ASCII encoded, use String::from_utf8. UTF-8 is a superset of the ASCII so every valid ASCII text is also valid UTF-8 text. With this route you can save allocations and byte-by-byte transcoding.

If you don't want to convert the source buffer into String and just want to debug-print it for ASCII text, try bstr crate which supports conventionally UTF-8 encoded bytes, unlike the primitive str type which is guaranteed to be UTF-8 encoded.

6 Likes

Yeah, for ASCII data you should probably use bstr, or str::from_utf8 or String::from_utf8_lossy. This should be faster than treating it as Latin-1: since the input is already valid UTF-8, it won't need conversion or char-by-char iteration. Also, all of these avoid copying and allocation, some or all of the time. The from_utf8 functions even have a fast path for ASCII input.

5 Likes