I've been creating some loaders for reading binary files, but I've been going about it pretty naively: I read the entire file into memory, and have an offset
. Then, whenever I encounter a sub-structure, I call that sub-struct's ::read()
method which will take the current massive buffer and a mutable reference to that offset. That reader will then read from the buffer piece by piece, using byteorder
to ensure correct endian-ness on each piece. Each time I read something in, I increment the offset by the amount read from the massive buffer. Here's the source file as reference.
This works, but it's repetitive and requires the entire file being loaded into memory. Is there a better general approach to reading data? I tried finding some more info online, but I couldn't find a comprehensive resource that goes into detail, so I could wrap my head around it. I also searched these forums, but most of the posts I found were from 2015, which was around Rust's v1.0 release, and a lot has changed since then.
The next loader will be parsing this archive file that's close to 400MB, and I have to load a 120KB companion file ahead of time as it acts like a directory for that archive. Now, I could do this, and it'll still work. The main purpose of this project is to learn Rust, and get comfortable with it. I have one strategy for loading files, but I want to progress that skill.
There's going to be tons and tons of one-off reads, so it looks like I should wrap my File
object in a BufReader
, so that it can grab larger chunks of the file ahead of time to vastly cut down on read operations on the storage hardware. I'm not quite sure how I'm supposed to use BufReader
to read data though. Is that where Cursor
comes in?
Then, there's mapping pieces of the data to struct
s. For example:
pub struct Header {
records: u32,
number_of_files: u32,
names_table_size: u32,
archive_full_size: u32,
pad: [u8; 0x10],
}
How should I go about reading this into memory? Each instance of this struct is 36 bytes, so should I grab a 32-byte slice, and somehow map that to my struct instance?
Looking at the format of the binary data, it looks like I can read the entire file sequentially without ever having to seek
around the file. I looked at memmap
, but I read that it's unsafe
. I have no problem using unsafe code, but I'm trying to figure out when to use it. I also looked at transmute
which looks quite convenient, but as noted in the docs, it is incredibly unsafe as there are many points of error in the simplest of structs, and this isn't C after all.