I found this beautiful example of creating an iterator from a Vec or a slice.
I'm working with a potentially larger than memory binary file, and it seems I need to create an iterator over the file, to return one Item/record at a time. Can then use map or whatever to process each Item/record.
So, without loading the whole file in memory, how can I do this?
The binary file layout is as follows:
len ....... len ......... len ........ etc.
4 bytes of item/record-len at the start (including the 4 len bytes).
Please help with suggestions, ideas.
I've made a lot of progress on other aspects of my project but I'm stuck with the creation of this custom binary file iterator.
I'm afraid not; I'm already using a parser for the data.
It's just that I need to split the binary file up into items I can iterate over, so that I can support larger-than-memory files. I expect most of the files I encounter with this to be in hundreds of gigs.
@H2CO3 I don't understand how/what I must pass to bytes in this example.
In my main function...
let mut file = File::open(FILENAME)?;
let mut two_bytes: [u8; 2] = [0; 2];
file.read_exact(&mut two_bytes).expect("first two bytes error");
let file_struct = RecordIter { reader: two_bytes };
for item in file_struct {
dbg!(item);
};
It says:
the trait bound `[u8; 2]: std::io::Read` is not satisfied [E0277]
Help: the trait `std::io::Read` is implemented for `&[u8]`
Note: required for `impls::RecordIter<[u8; 2]>` to implement `std::iter::Iterator`
Note: required for `impls::RecordIter<[u8; 2]>` to implement `std::iter::IntoIterator`
Help: convert the array to a `&[u8]` slice instead
impls:: here refers to my impls.rs, where I have placed the Iterator impl.
EDIT: Is this right... bytes is a slice containing the whole file?
How do I make a File into a &[u8] without having to load the whole file in memory?
Yes, absolutely.
Thanks again for the explanation.
I realized that you showed a slice because it's usable in the Rust playground... after I posted the question
Truly, truly... thank you! Your ten minutes means a whole lot to me.
The length bytes are already read (==gone from the reader) by the time the buffer is constructed. They have to be, otherwise it would be impossible to know the length.
Your problem was not precisely specified, but if you need those bytes, you can simply make a bigger vector and prepend those 4 bytes to the front of the buffer. (Like this.)
Almost forgot this... so at the other end, after something crunches these Items, is there a way to write/flush the output into a file (appending as it goes), rather than having to collect everything into a Vec (memory)?
It probably will be... just that I didn't know about it. Will try it out.
I've found the docs.rs site layout quite confusing.
So I've never really learnt to properly understand what capabilities are available in any given crate for example.