Do serde formats guarantee that serialized str can be deserialized as bytes?


#1

Does serde guarantee that something that was serialized as str can be deserialized as bytes and the bytes are the UTF-8 bytes equal to the str?

It appears that this works at least for serde_json and bincode, but can it be relied upon?

Use case: Micro optimization to avoid useless UTF-8 validation when all non-ASCII is in error.


#2

There are no guarantees at all about what formats do. Some formats support serialization but not deserialization. Some formats do not support str. Some formats do not support bytes.

For formats that do support serializing str and deserializing bytes, it seems reasonable to want those to be represented the same. This would be a feature request for any format that does not currently behave that way.

A few times I have been surprised at how fast UTF-8 validation is, so personally I would not bother with an optimization like this unless you can measure the difference in a macro benchmark.


#3

I don’t think this holds universally. For example, it won’t hold for formats that use a UTF-8-compatible encoding for bytes (such that the serialized data is valid UTF-8) but UTF-8 encoding for strings.


#4

OK. :frowning: Thanks. I’ll avoid the optimization, then.

Isn’t JSON such a format? Yet, the optimization would have worked with serde_json.


#5

How would I encode arbitrary bytes in a JSON string? JavaScript (at least in Firefox) appears to have \xNN encoding, but JSON does not.


#6

You can’t. And, yet, if JSON input has a string, you can ask serde to read bytes and you get the UTF-8 representation of the string as bytes, so the nature of JSON isn’t preventing the optimization I asked about. As noted, the optimization would work with serde_json.


#7

Well, then JSON is not one of those “formats that use a UTF-8-compatible encoding for bytes (such that the serialized data is valid UTF-8) but UTF-8 encoding for strings.”


#8

You cannot serialize arbitrary bytes as a JSON string without an additional layer of escaping. However, serde_json allows you to deserialize a JSON string as &[u8].