I am having trouble finding a good choice for serialising data. I have been using msgpack , but recently realised for large byte Vecs, it doesn't do such a great job. As a binary format, I would expect the serialised data to have only a fixed overhead, but it seems that is not the case.
For example, serialising say a 2.3 MB Vec of bytes, I would expect the result to be maybe 2.3MB plus a few dozen bytes at most. Instead it seems to be about 3.6MB.
Oh, in that case it's the problem that the serde can't specialize Vec<u8> over its generic Vec<T> impl. If you control the serialized types, try containers other than Vec<u8> specialized for bytes, like ones from the bytes crate or the BString from the bstr crate.
Use a format that natively supports binary blobs, for example my Neodyn Exchange crate:
use anyhow::Result;
use neodyn_xc::Value;
fn main() -> Result<()> {
let v: Vec<u8> = (0..=u8::MAX).cycle().take(u16::MAX.into()).collect();
let vlen = v.len();
let neodyned = neodyn_xc::to_bytes(&Value::Blob(v))?;
println!("v.len() = {}, neodyned.len() = {}", vlen, neodyned.len());
Ok(())
}
Use a non-self-describing format, such as bincode, which can serialize to the most compact representation possible, as the (static) type information will always be provided (and required): Playground – the problem with this approach is that dynamically typed deserialization won't work.
use serde_json::Value;
fn main() -> Result<()> {
let v: Vec<u8> = (0..=u8::MAX).cycle().take(u16::MAX.into()).collect();
let bincoded = bincode::serialize(&v)?;
println!("v.len() = {}, bincoded.len() = {}", v.len(), bincoded.len());
let value: Value = bincode::deserialize(&bincoded)?; // this fails
Ok(())
}