Saving a complex struct to disk - Fast & Efficiently

Hello,
I am trying to save a large struct (~100 MB - it consists of several very large Ndarrays) - I have tried serde, but it saves the data very slowly (json format), taking minutes. I was wondering if there was a more efficient way to save the data - it need not be in a human readable format like json or toml.
Thanks

Yes, binary formats are going to be more efficient than JSON. Some data formats are listed in the documentation.

I read through the ones listed at the start of the documentation, none of which 100% suited my objectives. I found the bincode crate however suited my objectives perfectly :+1:

Have you tried buffering before writing? (I don't know if serde does that or if the buffer is a reasonable size; multiples of 64 KiB are a good choice.)

1 Like

If you have a "plain old data" struct without any padding bytes, then you could use zerocopy to completely eliminate (de)serialization overhead.

Use npy for saving a single numeric array. Use the npz format to save many of them.

Do not use any general-purpose format, including bincode – yes, even binary formats won't be as efficient as the format specifically optimized for storing a large blob with a single, pre-determined element type.

Obligatory "make sure you're running with the --release flag"

2 Likes

Ironically, npy is a more generic format than something like bincode, it includes a bunch of metadata describing the specific format and dimensions.

Bincode can directly dump the bytes as it knows exactly the types involved (including static array sizes)

100MB is tiny, and shouldn't take minutes, even in JSON, unless you're running your program in the default dev/debug profile that is 10x-100x slower than --release.

With msgpack I'm writing and reading 2GB files in seconds.

Another possibility is that ndarray has some pathological interaction with serde, especially if the data is binary or highly multi-dimensional.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.