How to read from mem-mapped file into a Vec<u8> of variable length?

I know that in a file, there’s a Vec<u8> of size N (which I know in runtime), that I need to read. And I can’t figure out from the examples that I see if this is doable, or I just need to create a vector and write a cycle that reads every byte as a fixed-length struct into the target vector.

Is there a recipe for this?

In Golang, I use some memory mapping code to read fixed-size data structures, and process them one by one. Pseudocode:

struct MyStruct { a: u32, b: i32, c: f64 } // let's say it's 16 bytes long
let mut file_handler = os.Open(path_to_file).unwrap();
let my_map = Memmap(file_handler);
// might be fallible, I can't remember now
let my_var: MyStruct = some_class.read(my_map[some_offset..some_offset+16]);

then the other way around, I call some_class.write and give the offset and the new data.

The examples I see in the crates are either working like with buffers, or creates mem-map struct in memory, and it’s unclear how to map it to a disk file.

The recommended crate for using memory maps is memmap2. But is there a reason you want to use a memory map for this? I would normally recommend just reading and writing to/from the file directly.

Currently I use Serde with bincode which is utterly slow. And also I expect the amount of data to grow.

The code you shared shows the struct's bare data being written directly to the file. You can do that with the zerocopy crate, but be aware that it makes your file platform dependent. The same file might not work when transferred from one machine to another. The advantage of bincode is that the data is always the same no matter what machine you use it on.

But if that's what you want, you can look at this example:

use std::fs::File;
use zerocopy::{FromBytes, Immutable, IntoBytes};

#[allow(dead_code)]
#[derive(Debug, IntoBytes, FromBytes, Immutable)]
struct MyStruct {
    a: u32,
    b: i32,
    c: f64,
}

fn read() {
    let mut file = File::open("mydata").unwrap();
    let my_struct = MyStruct::read_from_io(&mut file).unwrap();
    println!("got: {:?}", my_struct);
}

fn write() {
    let my_struct = MyStruct {
        a: 17,
        b: 42,
        c: 0.5,
    };

    let mut file = File::create("mydata").unwrap();
    my_struct.write_to_io(&mut file).unwrap();
}

fn main() {
    let args: Vec<_> = std::env::args().collect();

    if args.len() < 2 {
        eprintln!("{}: argument required", args[0]);
    } else if args[1] == "read" {
        read();
    } else if args[1] == "write" {
        write();
    } else {
        eprintln!("{}: unknown command {}", args[0], args[1]);
    }
}

Here, the read_from_io and write_to_io methods write the struct's raw data directly to the file. The number of bytes written will be size_of::<MyStruct>().

3 Likes

So I have to do it in a cycle, correct?

use std::fs::File;
use zerocopy::{FromBytes, IntoBytes, Immutable};

#[derive(Debug, IntoBytes, FromBytes, Immutable)]
struct MyStruct {
    id: i64,
    x: f64,
    y: f64
}


fn main() {
    let mut vv: Vec<MyStruct> = vec![];
    for (id, x, y) in vec![
        (-1, 10., 20.),
        (100, 123.5, 2343.544),
        (1321231, 45.6677, 39548.566),
        (-394345, 1234.567, 987.654321),
    ] {
        vv.push(MyStruct { id, x, y});
    }
    
    let mut myfile = File::create("mydata").unwrap();
    vv.write_to_io(&mut myfile).unwrap();

}
^^ no method write_to_io found for struct Vec<MyStruct>

No, although the method does not exist of Vec<T>, it does exist for [T], so you can call Vec::as_slice() to do it:

vv.as_slice().write_to_io(&mut myfile).unwrap();

You can also convert to a &[u8] and write that directly to the file:

use std::io::Write;

let mut myfile = File::create("mydata").unwrap();
let arr: &[u8] = vv.as_slice().as_bytes();
myfile.write_all(arr).unwrap();

Actually, the issue seems to be different. Your original code works out of the box, and Rust is able to automatically find the write_to_io method on [T] when you call it on a Vec<T>. I didn't check your error properly.

Your real problem is not enabling required features:

[dependencies]
zerocopy = { version = "0.8.26", features = ["derive", "std"] }
4 Likes

That worked! I hadn’t added “std” in features. Thanks a lot!

1 Like

An update: I thought bincode is slow, and I was wrong. I just used it wrong way.

Instead of

let f = File::open("path...").unwrap();
my_obj: MyType = bincode::deserialize_from(f).unwrap()

I tried to wrap the File into a BufReader and voila, -97% on benchmarks! (consistently, on both small, and big datasets)

let f = File::open("path...").unwrap();
let br = BufReader::new(f);
my_obj: MyType = bincode::deserialize_from(br).unwrap()
save with serde         time:   [10.520 ms 10.538 ms 10.559 ms]
                        change: [-97.866% -97.858% -97.852%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

deserialize rtree with serde
                        time:   [9.9014 ms 9.9562 ms 10.009 ms]
                        change: [-97.268% -97.253% -97.237%] (p = 0.00 < 0.05)
                        Performance has improved.

I guess, all my code that worked to save to Zerocopy isn't necessary anymore. Sigh.
Nope, back to Zerocopy, because it doesn't slow down the editor as Serde derive traits do. (See messages below)

2 Likes

Another update, also dramatic: #[derive(Deserialize, Serialize)] is waaaay slower to check or compile. After I switched ~20 structs to Serde, the latency of Ctrl+S has jumped from like 1 second to ~5. Inline autocomplete in VSCode became much slower.

I'm going to make some measurements.

I tried timing a full cargo check (removing ./target), and an incremental one, and well, nothing measurable.

Measuring LLVM lines, I got an increment from 70K with Zerocopy, to 80K with Serde.

But in the editor it's very very sensible.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.