Get data from file to Vec<8> and string

Continuing my question from here _ link
for example, I get data from my file my_text.txt into a variable u:

my_text.txt

my text Зд
let u = fs::read("/.../my_text.txt").expect("error on handling file");
println!("{:?}", u); // [109, 121, 32, 116, 101, 120, 116, 32, 199, 228]

[109, 121, 32, 116, 101, 120, 116, 32, 199, 228]
So I see that I can get the data from the link above

    fn main() {
        let u01: Vec<u8> = vec![109, 121, 32, 116, 101, 120, 116];
        println!("u01 = {:?}", std::str::from_utf8 (&u01[..]).unwrap()); // my text
    }

But can’t get it in Cyrillic
[109, 121, 32, 116, 101, 120, 116, 32, 199, 228]

    fn main() {
        let u02: Vec<u8> = vec![199, 228];
        println!("u02 = {:?}", std::str::from_utf8 (&u02[..]).unwrap()); // ? not work
    }

I am getting u01 and u02 from u in loop.
How to get back from vec![199, 228] to "Зд"?

Rust's strings are encoded in UTF-8, the data as present appears to be encoded using Windows-1251, not utf-8. You will need to manually decode it (or configure whatever is creating the file to save as UTF-8).

3 Likes

Your text editor seems to be encoding your text using Windows-1251, not UTF-8. UTF-8 encoding is the standard used by Rust, thus your file isn’t printed the way you’re expecting it to.

There are two possible solutions, either use a text editor that supports UTF-8 (or if yours does, change its settings), or add some code for decoding it in Rust. Using Unicode – (and UTF-8 in particular) – everywhere possible is a good solution to avoid problems with conflicting encodings, and is slowly trending to become more and more universally used, as far as I’m aware. Decoding is still possible, of course. if it’s necessary for your use-case, e.g. you don’t control the files you’re wanting to process. For example with encoding_rs, you could do

use encoding_rs::WINDOWS_1251;

fn main() {
    let bytes = [109, 121, 32, 116, 101, 120, 116, 32, 199, 228];
    let (cow, _encoding_used, _had_errors) = WINDOWS_1251.decode(&bytes);
    println!("Decoded string: {}", cow);
}

Rust Playground

Decoded string: my text Зд
3 Likes

thanks

Cool! Thank you.