archnim
February 12, 2023, 6:05pm
1
Hello world. Here is what I'm trying to do:
While I'm not at the end of the file
Read 12 bytes to know the id of an object
Read 4 bytes to know the length of the object (let say the length is n)
Read n bytes, parse them and load the object in a Hashmap.
Knowing that the file can be hundreds of megabytes long, can you please suggest efficient ways to load its content ? To make things simple, just consider that the object is a string.
kornel
February 12, 2023, 6:44pm
2
Reading small fixed-length data:
let mut bytes = [0u8; 12];
file.read_exact(&mut bytes)?;
Reading variable-length data:
let mut bytes = vec![0u8; n];
file.read_exact(&mut bytes)?;
or a fancy version that doesn't initialize memory:
vec.clear();
vec.try_reserve_exact(n)?;
file.by_ref().take(n).read_to_end(&mut vec)?;
For interpreting the lengths, you'll need u32::from_le_bytes(&bytes)
(or from_be_bytes
/from_ne_bytes
depending on endian of the value).
4 Likes
Y.Z
February 13, 2023, 3:54am
3
For parsing, I suggest you checkout nom.
Y.Z
February 13, 2023, 6:14am
4
use nom::{bytes::complete::take, IResult};
fn parse_my_object(buf: &[u8]) {
let (head_object_bytes, remains) = take_12(buf).unwrap();
// handle your object parsing
handle_head_object(head_object_bytes);
let (length_bytes, the_rest_of_it) = take_4(remains).unwrap();
let length = u32::from_ne_bytes(length_bytes.try_into().unwrap());
let (_content, _overhead) = take_rest_of_it(the_rest_of_it, length).unwrap();
}
fn take_12(i: &[u8]) -> IResult<&[u8], &[u8]> {
take(12u8)(i) // will consume and return 12 bytes of input
}
fn take_4(i: &[u8]) -> IResult<&[u8], &[u8]> {
take(4u8)(i) // will consume and return 4 bytes of input
}
fn take_rest_of_it(i: &[u8], length: u32) -> IResult<&[u8], &[u8]> {
take(length)(i)
}
fn handle_head_object(_: &[u8]) {}
Not sure if this is what you want
1 Like
archnim
February 13, 2023, 9:24am
5
Thanks for your replies.
Does someone know why Rust doesn't just let us choose how many bytes we want to read ??
H2CO3
February 13, 2023, 9:33am
6
It does, as it's clearly demonstrated above.
blonk
February 13, 2023, 10:28am
7
To expand on what @H2CO3 wrote: If you're specifically asking how to handle a situation where you have a larger buffer than you want to read, then you can create a subslice:
let mut buf = [0; 1024]; // or a Vec
let size = 128;
foo.read_exact(&mut buf[0..size]);
let size = 32;
foo.read_exact(&mut buf[0..size]);
1 Like
system
Closed
May 14, 2023, 1:18pm
9
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.