In a couple of scenarios, I need to exchange data over a network. In some of those scenarios, the format is pretty rigid, in other cases it's more dynamic, i.e. with an evolving format where fields might be added over time.
I guess using serde
is a good way to go, but which serialization format to use in which scenario?
I'm generally fond of simple and easy to understand formats, but I also need to be 8-bit clean (as in transmitting opaque 8-bit data as part of the messages), which almost rules out JSON, I guess (as base64 encoding or representing 8-bit data as an array of numbers doesn't seem to be very efficient).
As far as I understood, serialization formats fall more or less into one of two categories:
- self-describing formats (such as JSON, CBOR, MessagePack, Pot)
- non self-describing formats (such as Postcard)
Please correct me if I'm wrong.
However, I feel like the boundary between those is somewhat blurry. That is because the type systems are not consisent over all serialization formats. I cannot really "describe" a timestamp in JSON (i.e. a UNIX time stamp would be encoded the same way as an integer, or encoded as a string), but I can tag a value as timestamp in CBOR or MessgePack; whereas in Postcard, I can't even distinguish between numbers and strings in the binary message (and I must know whether to expect a number or string).
On the other hand, I can perfectly encode a JSON document in the Postcard format (the following code assumes that the enum discriminants of serde_json::Value
are stable):
let json: serde_json::Value = serde_json::from_str("{\"A\": true}").unwrap();
let bytes = postcard::to_stdvec(&json).unwrap();
println!("{bytes:?}");
Output:
[1, 1, 65, 1]
I read here that CBOR was inspired by MessagePack. What does CBOR do differently in regard to MessagePack? And how about "Pot" or other formats?
What would be your general advice when choosing a serialization format? Which formats are well-established? Which ones are know to cause trouble in certain scenarios?