Append serialized objects to a single file and load


I try to make a Bioinformatics tool with rust.

I need to make a struct which serializes objects as bytes and append to a single file, and loads it.

I'm stuck on making it.

  1. What's idiomatic way to serialize a struct into bytes and append to a file?
    Is it okay to do like:
fn serialize_and_save<T: Serialize> (file:&mut File, value:T) {
    let encoded = bincode::serialize(value)?;

    file.write_all(&encoded)?; // the file may have some T objects already

  1. How could I load a file having bytes of multiple objects and deserialize them?

Could you help me solve these?
Any help would be appreciated.

I'll assume the file is appended to over time and that you don't have the entire data-set in memory at once. (If you do, you should just call bincode::serialize() on the slice to serialize the entire thing in one go.)

You'll need some way of distinguishing where one entry in the file ends and the next begins. The easiest way is probably to length-prefix the binary data. Before file.write_all(&encoded)?;, use file.write_all(&(encoded.len() as u64).to_le_bytes())?; to store the length of the next data block.

When deserializing, grab the first 8 bytes and decode as a length like this: let block_length = u64::from_le_bytes(file_data[..8].try_into().unwrap());, then use that length to determine how many bytes the next block takes. deserialize(&file_data[8..][..block_length]) The data after that will be the rest of the file, with the length of the next block as the first 8 bytes again. let rest_of_data = &file_data[8..][block_length..];.

That's just one solution though. Another would be a delimited format where each block is separated by some known byte sequence. To find the end of a block you look for the known sequence. (That's how CSV works for example - the delimiter between each value a comma.) It requires checking the data block for the block-termination sequence and escaping it if present though, so it's a bit more annoying to use.

1 Like

Thanks for answer.

Could you recommend a delimiter byte?

For who works the same thing as me: serialization - How can I add separators between different records in a bincoded file? - Stack Overflow