We have a sequence of long (sometimes 1Mb) reads from Read::read_exact whose data is decoded/deserialized into an owned structure, something like
use std::io::{Read, Error};
struct A;
fn deserialize(data: &[u8]) -> Result<A, Error> {
todo!()
}
fn read_many<'a, R: Read + 'a>(
mut reader: R,
) -> impl Iterator<Item = Result<A, Error>> + 'a {
(0..10).map(move |_| {
let mut bytes = [0; 4];
reader.read_exact(&mut bytes)?;
let length: u32 = u32::from_le_bytes(bytes);
let mut data = vec![0; length as usize];
reader.read_exact(&mut data)?;
deserialize(&data)
})
}
We would like to re-use the allocated region as much as possible. Our current approach is to use
/// A small wrapper around [`Vec<u8>`] that allows us to reuse memory once it is initialized.
/// This may improve performance of the [`Read`] trait.
#[derive(Clone, Default)]
pub struct ReadBuffer {
data: Vec<u8>,
// length to be read or is read
length: usize,
}
impl ReadBuffer {
/// Set the minimal length of the [`ReadBuf`]. Contrary to the
/// method on `Vec` this is `safe` because this function guarantees that
/// the underlying data always is initialized.
pub fn set_len(&mut self, length: usize) {
if length > self.data.capacity() {
// exponential growing strategy
self.data = vec![0; length * 2];
} else if length > self.data.len() {
self.data.resize(length, 0);
}
self.length = length;
}
}
impl AsRef<[u8]> for ReadBuffer {
fn as_ref(&self) -> &[u8] {
&self.data[..self.length]
}
}
impl AsMut<[u8]> for ReadBuffer {
fn as_mut(&mut self) -> &mut [u8] {
&mut self.data[..self.length]
}
}
fn read_many1<'a, R: Read + 'a>(mut reader: R) -> impl Iterator<Item = Result<A, Error>> + 'a {
let mut data: ReadBuffer = Default::default();
(0..10).map(move |_| {
let mut bytes = [0; 4];
reader.read_exact(&mut bytes)?;
let length = u32::from_le_bytes(bytes) as usize;
data.set_len(length);
reader.read_exact(data.as_mut())?;
deserialize(data.as_ref())
})
}
does anyone use something like this or know an even better way of doing this?
I've done something pretty similar before, but with just a plain 'ol Vec:
fn read_many<'a, R: Read + 'a>(mut reader: R) -> impl Iterator<Item = Result<A, Error>> + 'a {
let mut buffer = Vec::new();
(0..10).map(move |_| {
let mut bytes = [0; 4];
reader.read_exact(&mut bytes)?;
let length = u32::from_le_bytes(bytes) as usize;
// Ensure the buffer is large enough to handle the payload.
if length > buffer.capacity() {
buffer.resize(length, 0);
}
let window = &mut buffer[..length];
reader.read_exact(window)?;
deserialize(window)
})
}
But I also think that making a separate data type to handle the buffering is fine : v)
In this I'm just letting the vector use it's own allocation strategy with resize and the minimum capacity it needs to have, but you could be fancy and pass in your own computed value to 'force' an exponential growing strategy.
Addendum: Also, if you're trying to squeeze efficiency out of your reader, it's probably worth looking at BufReader as well. It maintains it's own buffer for doing large reads at a time, then letting you read slowly over the buffered data. Slightly different to what you're trying to do, but definitely adjacent!
Is your custom growth strategy really worth it? You could just use a Vec.
You can avoid re-initializing everything but newly exposed parts of your buffer, by never shrinking the length and passing around slices:
fn read_many2<R: Read>(mut reader: R) -> impl Iterator<Item = Result<A, Error>> {
let mut data = Vec::new(); // vec![0; reasonable_expectation];
(0..10).map(move |_| {
let mut bytes = [0; 4];
reader.read_exact(&mut bytes)?;
let length = u32::from_le_bytes(bytes) as usize;
// Never shorten the known-initialized length
// Maybe worth it to spell this out in an `if` to possibly
// avoid a call to `resize`
let max = data.len().max(length);
data.resize(max, 0);
// Check that these bound checks are eliminated or throw in an
// assert! maybe
let this = &mut data[..length];
reader.read_exact(this)?;
deserialize(this)
})
}
read1 2^10 time: [4.7600 us 4.8518 us 4.9582 us]
read2 2^10 time: [4.4994 ms 4.5141 ms 4.5288 ms]
read1 2^12 time: [4.3802 us 4.4795 us 4.6010 us]
read2 2^12 time: [4.4811 ms 4.4940 ms 4.5068 ms]
read1 2^14 time: [5.2996 us 5.3199 us 5.3390 us]
read2 2^14 time: [4.5012 ms 4.5135 ms 4.5258 ms]
use criterion::{criterion_group, criterion_main, Criterion};
use bench_read::{read_many1, read_many2};
fn add_benchmark(c: &mut Criterion) {
(10..=20).step_by(2).for_each(|log2_size| {
let size = 2usize.pow(log2_size);
let mut data = vec![];
for i in 0..10 {
data.extend_from_slice(&((i + size) as u32).to_le_bytes());
data.extend(std::iter::repeat((i % 255) as u8).take(size));
}
c.bench_function(&format!("read1 2^{}", log2_size), |b| {
b.iter(|| {
let reader = std::io::Cursor::new(&data);
let _ = read_many1(reader).count();
})
});
c.bench_function(&format!("read2 2^{}", log2_size), |b| {
b.iter(|| {
let reader = std::io::Cursor::new(&data);
let _ = read_many2(reader).count();
})
});
});
}
criterion_group!(benches, add_benchmark);
criterion_main!(benches);
imo this makes sense - the issue is that resize may trigger a realloc which will cause everything on it to be copied to the new region.