Why did the Rust team decide on an inconsistent approach to invalid UTF-8 encoded data?

I will edit my comments using the syntax used by @BurntSushi, namely U+FFFF instead of 0xFFFF.

Can you cite the exact passage on wikipedia you are referring to? The phrase "must never appear in a valid UTF-8 sequence" only shows up once, and it does not appear relevant.

The table in the wikipedia article is talking about "UTF-8 code units (individual bytes or octets)" not Unicode scalar values. So it means those bytes will never appear in UTF-8. Not that those unicode values will never appear.

5 Likes

It doesn't say that. You are mixing up Unicode and its various encodings.

1 Like

Wow! I am brain dead. You've been saying this the whole time, and I finally understand what you mean. I'm very sorry for wasting your time. It clearly states above the table that it's talking about "UTF-8 code units". Thank you for your patience.

11 Likes

Yes, it does indeed. I'm sorry about my brain fart.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.