Compact Binary Serialization with Serde?

PSeitz · April 19, 2018, 12:44pm

I want to serialize a vector, but using only u32 or less for its length information. Serde uses by default u64 and since there are potentially a lot of small vectors, a lot of space would be wasted.

Is this possible with serde and does it make sense to switch to serde? In the docs there is a method for sequences which always has the length and therefore its type included, Serializer in serde - Rust, which doesn't seem to help here.

dtolnay · April 19, 2018, 1:34pm

The binary representation of sequences depends on the data format you are using, not on Serde. For example MessagePack serializes lengths as 1 byte if under 16, 2 bytes if under 65536, and 4 bytes otherwise (implementation) -- so even your u32 length would seem wasteful in comparison.

That said, if you care about compactness you are going to see better results from using a real compression algorithm applied to the entire serialized data. At that point it will hardly matter what underlying format you use. Try chaining together bincode::serialize_into with a DeflateEncoder or similar from the flate2 crate.

PSeitz · April 19, 2018, 2:53pm

Thanks, I was looking for a real example. Currently I do it almost exactly the same as MessagePack, but since I serialize Vec<u32> I also apply this to the whole vector. Since the numbers consist mostly of small values this already compresses better and is faster than a general purpose compression algorithm (Vint Snappy Comparison)

The link to the code is the actual conversion, which gets referenced from the serde serializer here.
I was browsing a lot of implementations and most of them have something like this, a struct containing a writer, where the data is written to.

impl<'a, W> serde::Serializer for &'a mut Serializer<W>
where
    W: Write

Is this best practice? It's a little confusing, since I didn't find this pattern in the serde documentation.

parasyte · April 19, 2018, 8:37pm

FWIW my favorite serialization format for resource-constrained environments is CBOR. I've used serde_cbor in multiple projects with excellent results. Highly recommended over any DIY solution.

PSeitz · April 20, 2018, 7:39am

I previously tested rust_cbor, but performance is quite slow with only 50MB/s, compared to 1GB/s of my solution. I don't know if this is due to rustc_serialize being used.

serde_cbor is much faster with around 350MB/s, but compression is even worse than bincode for ranges like (199_990..200_000) already.

I would rather use something existing, but compression ratio and speed are crucial in this component.

parasyte · April 20, 2018, 8:25pm

That's a good point. It is worth mentioning that CBOR is designed as a JSON-like replacement for resource constrained environments, not for raw performance.

Boscop · April 20, 2018, 8:38pm

FWIW, a while ago I forked bincode to implement variably sized encoding of integers (leb128), and packing of floats in different options (e.g. as f16) and it also supports bit vectors etc.
You can't encode the data any smaller than this without extracting patterns into a lookup table like in a compression algo:

All integer types use variable length encoding, taking only the necessary number of bytes. This includes e.g. enum tags, Vec lengths and the elements of Vecs. Tuples and structs are encoded by encoding their fields one-by-one, and enums are encoded by first writing out the tag representing the variant and then the contents. Floats can be encoded in their original precision, half precision (f16), always f32 or at half of their original precision.

But since I did this for a fast-paced multiplayer game project (using Enet) that I stopped working on, I haven't updated it since (but it works well, and all the tests pass!), because I haven't really needed it, and serde changed the API a lot after that (this was pre-1.0 serde).

I'll gladly accept PRs on it that bring it up to speed with the recent serde/bincode version

Topic		Replies	Views
Binary serialization in custom format help	18	4219	September 21, 2021
How to serialize vector of struct into a more compact json style? help	7	5986	November 13, 2020
Serialize bytestrings in `serde` differently depending on the output format help	5	642	November 29, 2021
What is the best way to de-/serialize a simple struct? help	8	1032	April 30, 2021
Using serde to deserialize a Vec<u8> into a struct (fails) help	7	5591	January 12, 2023

Compact Binary Serialization with Serde?

Related Topics