Ser/de large vector with buffering to/from file

Looking for a bit of help: I'm trying to figure our how I can save a large vector (size 1-10GB) from memory to a file with serde without duplicating the memory consumption.

Taking the following example:

use core::ops::Deref;
use postcard::from_bytes;
use postcard::to_stdvec;
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug, Eq, PartialEq, Clone)]
struct RefStruct<'a> {
    bytes: &'a [u8],
    str_s: &'a str,
}

fn main() {
    let rs = RefStruct {
        bytes: &[0x01, 0x10, 0x02, 0x20],
        str_s: "hElLo",
    };

    let input = vec![rs; 10];
    let ser = to_stdvec(&input).unwrap();
    // Save to file here?
    let deser: Vec<RefStruct> = from_bytes(ser.deref()).unwrap();
    println!("{:?}", deser);
}

Let's say my data length is not 10 but 3*10^9. I wouldn't want to serialize it as a whole into memory then save, but should probably write it into a file using a buffer, only serializing a portion at a time. And the same for reading it back. What is the canonical way of doing this?

How to (and whether you can) serialize directly to a file depends on the actual implementation of the serialization format you're using. I see you're using postcard, which has a to_io() function for serializing to a Write instance and from_io() for deserializing from a Read. I'm not familiar with postcard's implementation, and I can't tell from a quick glance at the code, but I would expect using these functions to work piecemeal, or else there would be little point in the crate providing them.

Thank you, I see -- I assumed this was serde level. I'm trying various implementations, and not all seem to offer io.

They may be named differently, but most serde data formats should provide a function that takes a std::io::Write. For example, in serde_json it would be serde_json::to_writer() instead.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.