How to print the byte string literal of a bytes?

If I have a Vec<u8>/[u8] how can I get its byte string literal.

For example:

fn to_byte_string_literal(a: impl AsRef<[u8]>) -> String { ... }

assert_eq!(to_byte_string_literal([30, 31, 30, 30, 43]), r"\x1E\x1F\x1E\x1E+");

I want to obtain the byte literal.
Imagine I get a hex from the web or somewhere. And I want to make a CLI tool to convert that hex to byte string literal. Then I can use it (b"xxxx") on my rust code.

I suggest bstr for working with byte strings:

It is written by BurntSushi, a member of the Rust library team.

2 Likes

If you want to get a literal (as in &'static str), you could do this:

fn to_byte_string_literal(a: &'static impl AsRef<[u8]>) -> &'static str {
    std::str::from_utf8(a.as_ref()).unwrap()
}

fn main() {
    assert_eq!(to_byte_string_literal(&[30, 31, 30, 30, 43]), "\x1E\x1F\x1E\x1E+");
}

(Playground)


But I'm not sure if I understand your question correctly.

1 Like

How to escape that?

Actually, I want to print "\x1E\x1F\x1E\x1E+".

It should be "\\x1E\\x1F\\x1E\\x1E+"

Maybe I misunderstood what you try to achieve. If you want to create a string with corresponding escape sequences, maybe you rather want something like the following? (Just written down quickly, maybe there is an easier or more efficient way.)

use std::fmt::Write;

fn to_byte_string_literal(a: impl AsRef<[u8]>) -> String {
    fn inner(bytes: &[u8]) -> String {
        let mut lit = String::new();
        for &byte in bytes {
            if byte >= 40 && byte <= 126 {
                lit.push(std::char::from_u32(byte as u32).unwrap());
            } else {
                write!(lit, "\\x{byte:02X}").unwrap();
            }
        }
        lit
    }
    inner(a.as_ref())
}

fn main() {
    assert_eq!(to_byte_string_literal([30, 31, 30, 30, 43]), r"\x1E\x1F\x1E\x1E+");
}

(Playground)

1 Like

Wonderful!

What if I want to invert this?

Get the bytes from a string r"\x1E\x1F\x1E\x1E+".

r"\x1E\x1F\x1E\x1E+".as_bytes()

This is wrong.

[src/main.rs:2] r"\x1E\x1F\x1E\x1E+".as_bytes() = [
    92,
    120,
    49,
    69,
    92,
    120,
    49,
    70,
    92,
    120,
    49,
    69,
    92,
    120,
    49,
    69,
    43,
]
[src/main.rs:3] b"\x1E\x1F\x1E\x1E+" = [
    30,
    31,
    30,
    30,
    43,
]
1 Like

It's not "wrong", it's just different. In the "raw" string, all escape sequences are left as-is, i.e. it will contain characters '\', 'x', '1', 'E' etc. literally. In the byte string, all escape sequences are expanded, therefore the string contains characters '\x1E', '\x1F' etc. Do you want to expand escape sequences already existing in the string you've got (i.e. convert the former case to the latter)?

1 Like

Sorry, maybe my description is wrong.

That answer is not what I want.

I want to recover b"\x1E\x1F\x1E\x1E+" (aka [30, 31, 30, 30, 43]) from r"\x1E\x1F\x1E\x1E+".

If I understand you right, the Google query you are after is unescape string.
It seems crates:unescape does the trick. According to src it supports \b, \f, \n, \r, \t, \", \', \\, \u and \x escapes.

fn main() {
    assert_eq!(&[30, 31, 30, 30, 43],unescape::unescape(r"\x1E\x1F\x1E\x1E+").unwrap().as_bytes());
}

1 Like

unescape is for JSON unescaping. I think @AurevoirXavier wants to unescape Rust literals. I'm not sure if it's the same.

1 Like

It handles output of OP's to_byte_string_literal(), as long as the original input was UTF8-compatible.

The code is short enough to easily convert it from outputting Option<String> to Option<Vec<u8>>, liften the requirement of UTF8ness.

This crate is quite old.
I made a more powerful one.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.