How to do Fast File IO?


#1

I would like to quickly digest big binary data files. What approach should I use?

I started using BufReader, and it works but is slow. I did tests using unbuffered reads, and that goes extremely fast–but doesn’t have as many features. (I like that .read_until() will find arbitrary separators.)

Should BufReader be fast if I only used it correctly? (And what’s the difference between Bufreader and BufRead?)

Should I stick with unbuffered reads and build up from there? Use memmap? Try to use libc’s buffered IO from Rust??

Thanks,

-kb


#2

Could you show the slow code that uses BufReader vs the fast code that uses unbuffered reads?

Memory maps may be worthwhile. On Linux, they’ll probably be faster for single shot reads of large files, but will be slower than normal file I/O if you’re doing a lot of them in parallel or on small files. Memory maps are certainly convenient, although you cannot read all types of “files,” for example, /proc/cpuinfo.


#3

Should BufReader be fast if I only used it correctly?

Use BufReader if you’re reading small bits at a time. BufReader will turn these many small reads into fewer large reads reducing the number of system calls your program ends up making. However, using BufReader should never slow your code down. If you’re reading large chunks of a file at a time, reading from a BufReader should bypass the buffering logic.

Make sure:

  1. Caching isn’t affecting your results (read the file once to “warm” the cache before actually benchmarking).
  2. You’re compiling in release mode (cargo build --release, cargo bench, or rustc -C opt-level=3).

And what’s the difference between Bufreader and BufRead?

BufReader is a read adapter that buffers reads from any type implementing the Read trait. BufRead is a trait that describes a common interface for buffered readers. BufReader implements BufRead.


#4

I don’t have exact parallel code, and I’d have to do some sanitizing…

But I can say I am doing little reads: dozen to hundred bytes, mostly doing read_until(), sometimes iterating over .bytes(). No seeks (which I read will discard any buffering).

-kb


#5

Ah, thanks.

Another thought: BufReader seems to only want to fill a Vec. Maybe I am blaming the wrong suspect, maybe my problem is that a vec isn’t as fast as:

let mut buf = [0u8; 0x100000];

I want to read faster than 100MB/s on multi-GB files…

-kb, the Kent who has more experimenting to do.


#6

Did you use the Vec::with_capacity() to preallocate the needed space to avoid reallocating when Vec runs out of space?


#7

I have made progress. My biggest slowdown might have been my casual manipulation the vec I was getting back from the read. I did use .with_capacity(), but a vec still isn’t free.

I did another version of my code that uses memmap and accessing a file in cache but still going in the file system, I am reading input data as slices, doing simple processing (but not completely trivial processing), and getting over 400MB/s throughput.

My code is really just exploratory, proof of concept, but it proves that concept that Rust can go fast.

Thanks,

-kb


#8

Check the baseline of your system’s read/write performance:

Writing zeroes:
$ time dd if=/dev/zero of=empty-file bs=1MB count=$(( 1024 * 1024 ))

Writing random data (bottlenecked by /dev/urandom):

$ time dd if=/dev/urandom of=random-data bs=1MB count=$(( 1024 * 1024 ))

Write to disk and drop caches:

$ sync
$ echo 3 > sudo /proc/sys/vm/drop_caches

Now see the performance when reading from disk. Zeroes seems to have an optimization to make it too fast:
$ time dd if=empty-file of=empty-file2 bs=1MB

This should give you a better idea of the performance:
$ time dd if=random-data of=random-data2 bs=1MB

Spinning HDDs tend to cap at just under 200MBps on SATA3 connections. SSDs using M.2 are considerably faster (~500MBps). RAID and JBODs have different characteristics.

With this information you can have an idea of how fast your reads can be and no faster.