What is the absolute fastest way to load 50M f32 from file to memory?

anon80458984 · January 21, 2019, 9:48am

Short Version:

You have complete control over the on-disk file format. 100% your choice.
The in memory representation needs to be Vec
What is the fastest way to read 50M f32 into memory ?

Long Version:

I have a number of unit tests of the form: 1. read some data, 2. transform it into a 50M elem Vec, 3. do some work

Right now, steps 1&2 are dominating the unit test time. Instead, I want to do a two stage process:

stage1: 1. read some data, 2. transofrm it into a 50M elem VEc, 3. save this to disk
stage 2 (unit tests): read pre-formatted 50M elem Vec, do unit test work

Question now is: what is the optimal on-disk format, and what is the optimal way to read it?

I'm on Linux x64. This does NOT need to work on any other platform.

anon80458984 · January 21, 2019, 9:49am

Also, here is the output of

free -h
              total        used        free      shared  buff/cache   available
Mem:            94G        2.4G         88G        255M        3.3G         90G
Swap:          4.0G          0B        4.0G

I'm okah with creating a custom ramdisk, so everything is in memory (even when "on-disk").

kornel · January 21, 2019, 11:14am

Usually people recommend memmap for such things, but memory mapping comes uneven performance (caching beyond your control), and tough gotchas around concurrency and error handling.

Other than that, just loading it with one syscall, without copies or reallocations should be reasonably fast:

let mut buf = Vec::with_capacity(50_000 * 4);
reader.by_ref().take(50_000 * 4).read_to_end(&mut buf)?;
// https://rust-lang.github.io/rfcs/2835-project-safe-transmute.html
// http://lib.rs/bytemuck
let slice = std::slice_from_raw_parts(buf.as_ptr().cast::<f32>(), vec.len() / 4);

matklad · January 21, 2019, 1:26pm

people say that for sequential access memmap is not faster: Which is fastest: read, fread, ifstream or mmap? – Daniel Lemire's blog

BurntSushi · January 21, 2019, 1:49pm

If you want to test it on your system, compare rg something-that-does-not-match really-large-file --mmap with rg something-that-does-not-match really-large-file --no-mmap. In my experience, results may vary depending on your environment! Make sure to control for I/O, depending on what you want to measure.

vitalyd · January 21, 2019, 3:39pm

An advantage of mmap here would be reduced memory footprint across multiple parallel instances of your tests - they can all reuse the underlying physical memory, and just have their own mapping. If each test were to read into their own buffer/Vec, it would “duplicate” the data. This all assumes that memory would be read-only.

This may not matter but something to consider. I agree with others that you should test the different approaches.

notriddle · January 21, 2019, 5:03pm

The absolute fastest way to load lots of data is to include_bytes!("data.raw") it while you compile your application, and then typecast the &[u8; 4*50_000_000] to a &[f32; 50_000_000]. This way, it gets loaded automatically and with negligible overhead while your application loads.

You can't change the data after you've compiled (not easily, anyway), but if it's something like a neural net model, it might be fine.

cuviper · January 21, 2019, 5:46pm

Be careful about alignment too!

Topic		Replies	Views
What is the efficient way of handling large vec's help	10	1152	August 14, 2020
Fastest reading and writing to/from stdin/stdout possible help	6	1751	April 8, 2020
What is the fastest way to convert bytes to numbers? help	19	1518	March 1, 2023
Fastest way to load a dataset	5	960	April 13, 2020
Shortest code to read and write a file	11	2219	January 12, 2023

What is the absolute fastest way to load 50M f32 from file to memory?

Related Topics