String::from_utf8_lossy() which replaces any non utf-8 character with �. I'm wondering why
std::str::from_utf8_lossy() doesn't exists?
UTF8 is a variable-width encoding, so there’s no guarantee that replacement can be made without changing the size of the buffer, which is something that
str can’t do.
That makes sense. Thank you.
The width problem could be solved by sticking to ASCII (replace any invalid bytes with
? or any other ASCII character). But another problem is that
std::str::from_utf8() doesn't have a buffer to modify — it returns a reference to an existing region of memory (slice), unlike
String::from_utf8_lossy() which is allocating a new
There could in principle be a
Assuming you're ok with 1-4
? characters per broken utf-8 codepoint sequence. Or arbitrarily long if it's just
10... continuation bytes, I guess.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.