Read file with specific encoding

Hi everyone,

Again I am lost, and this time it is with file encoding.

I have a customer that send us an XML file encoded as UTF8 (that's what VS CODE say to me).
But in the root node, I have <?xml version="1.0" encoding="windows-1252"?>.

When I try to read it as UTF8, I get .
When I try to read it as WINDOWS-1252, I get �.
I must get é.

Here is the implementation :

    use std::io::Read;

    let file = File::open(common::get_ac3_xml_filepath()).unwrap();
    let mut rdr = encoding_rs_io::DecodeReaderBytesBuilder::new()
        .encoding(Some(encoding_rs::WINDOWS_1252))
        // .encoding(Some(encoding_rs::UTF_8))
        .build(file);

    let mut content = String::new();
    rdr.read_to_string(&mut content).unwrap();
    println!("{:?}", content);

For your information, I use serde-xml-rs to extract datas from XML next (I don't know if it is usefull to know).

I think I missed something, but I don't understand what... :persevere:

I've already read some topics, like this one, but it do not solve my problem.

Could you help me please ?

looks like you got bad data...
the single codepoint variant is:

>>> "é".encode("utf8")
b'\xc3\xa9'

>>> "é".encode("1252")
b'\xe9'

The two codepoint (e + combining acute)

>>> "é".encode("utf-8")
b'e\xcc\x81'

Damn, I was too focus on the implementation that I missed to verify the datas...
You're right, sorry to waste your time... :sweat:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.