Read variable number of bytes from a file

Hello world. Here is what I'm trying to do:

  • While I'm not at the end of the file
    • Read 12 bytes to know the id of an object
    • Read 4 bytes to know the length of the object (let say the length is n)
    • Read n bytes, parse them and load the object in a Hashmap.

Knowing that the file can be hundreds of megabytes long, can you please suggest efficient ways to load its content ? To make things simple, just consider that the object is a string.

Reading small fixed-length data:

let mut bytes = [0u8; 12];
file.read_exact(&mut bytes)?;

Reading variable-length data:

let mut bytes = vec![0u8; n];
file.read_exact(&mut bytes)?;

or a fancy version that doesn't initialize memory:

vec.clear();
vec.try_reserve_exact(n)?;
file.by_ref().take(n).read_to_end(&mut vec)?;

For interpreting the lengths, you'll need u32::from_le_bytes(&bytes) (or from_be_bytes/from_ne_bytes depending on endian of the value).

4 Likes

For parsing, I suggest you checkout nom.

use nom::{bytes::complete::take, IResult};

fn parse_my_object(buf: &[u8]) {
    let (head_object_bytes, remains) = take_12(buf).unwrap();
    // handle your object parsing
    handle_head_object(head_object_bytes);
    let (length_bytes, the_rest_of_it) = take_4(remains).unwrap();
    let length = u32::from_ne_bytes(length_bytes.try_into().unwrap());
    let (_content, _overhead) = take_rest_of_it(the_rest_of_it, length).unwrap();
}

fn take_12(i: &[u8]) -> IResult<&[u8], &[u8]> {
    take(12u8)(i) // will consume and return 12 bytes of input
}

fn take_4(i: &[u8]) -> IResult<&[u8], &[u8]> {
    take(4u8)(i) // will consume and return 4 bytes of input
}

fn take_rest_of_it(i: &[u8], length: u32) -> IResult<&[u8], &[u8]> {
    take(length)(i)
}

fn handle_head_object(_: &[u8]) {}

Not sure if this is what you want

1 Like

Thanks for your replies. :pray:t3:

Does someone know why Rust doesn't just let us choose how many bytes we want to read ??

It does, as it's clearly demonstrated above.

To expand on what @H2CO3 wrote: If you're specifically asking how to handle a situation where you have a larger buffer than you want to read, then you can create a subslice:

let mut buf = [0; 1024]; // or a Vec
let size = 128;
foo.read_exact(&mut buf[0..size]);
let size = 32;
foo.read_exact(&mut buf[0..size]);
1 Like

Oh yes ! :+1:t3:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.