What is the expected behavior of from_utf8 in presence of BOM

I've got UTF-8 encoded data that begins with the UTF-8 Byte Order Mark (0xef, 0xbb, 0xbf). I would expect that when I hand that to String::from_utf8 the BOM would just be dropped, but instead it gets changed to " \u{feff}":

    let x: Vec<u8> = vec![0xef, 0xbb, 0xbf, 0x7b];
    let y = String::from_utf8(x);
    // Would expect Ok("{"), get Ok("\u{feff}{")
    println!("{:#?}", y);

(playground)

Of course, I can just check for the BOM & remove it before I call from_utf8, but I'm wondering if this is expected?

String::from_utf8 only does validation; it never modifies the input in any way. (String is just a wrapper around Vec<u8>, and from_utf8 just returns the same valid UTF-8 data that you pass in.)

Ok, then. Thanks!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.