CowMut, or borrowed/owned mutable temp buffers?

I have the following API:

fn read_complex_struct(reader: impl Read) -> ComplexStruct {
    let mut tmp_buffer = Box::new([0u8; 16384]);
    ...
}

I need tmp_buffer not only to amortize syscall cost, but actually need to be able to perform random memory access in it for efficient string search. So I can't use BufRead trait.

I want users to be able to provide their own (reused) buffer from the outside. So I do this:

fn read_complex_struct(reader: impl Read, tmp_buffer: &mut [u8])
-> ComplexStruct { .. }

However, now I have exposed the user to a bunch of potential borrowing problems, even if they don't want to deal with "optimizing" buffers. So I iterate further:

fn read_complex_struct(reader: impl Read, tmp_buffer: impl AsMut<[u8]>)
-> ComplexStruct { .. }

Then people can send in Box::new([0u8; 16384]) and it will not require them to hold a buffer elsewhere. This matters because the actual real-world example of read_complex_struct actually looks closer to this:

impl Reader<R: Read, Buffer: AsMut<[u8]>> {
    fn new_with_buffer(reader: R, tmp_buffer: Buffer) -> Self { .. }
    fn new(reader: R) -> Self { Self::new_with_buffer(reader, Box::new(...)) }
}

anyway, ability to pass owned data into our read_complex_struct is desirable to make lifetimes simpler when one doesn't care about performance. I found that forcing users to pass in borrowed, non-static data quickly forces them to deal with self-referential structs (aka their own structs becoming self-referential).

However, there's another problem. read_complex_struct's correctness relies on tmp_buffer not resizing itself. In other words, tmp_buffer.as_mut().len() must never change. Unfortunately, AsMut<[u8]> provides few static guarantees around that.

So, my final attempt:

fn read_complex_struct<const BUF_LEN: usize>(
    reader: impl Read,
    tmp_buffer: impl AsMut<[u8; BUF_LEN]>
) -> ComplexStruct { .. }

surely if the length of the buffer is known statically at compile-time again, it cannot change? But it would actually be nice to not only let users choose their buffer size, but allow them to do so at runtime. For as long as they don't change their mind halfway through.

What is the most common and idiomatic solution here? Build my own trait like AsRef<[u8]> that comes with a tighter contract? Is there such a thing already?

I'm not clear on something. When using AsMut for the buffer, after calling as_mut how can the size change? No other entity can have mut access during the method call.

1 Like

In principle I can lend mutable access to somebody else between as_mut() calls, or I decide to be a dick and just build a contrived example like this:

struct BreakTheWorldBuffer(Vec<u8>);

fn random_integer() -> usize {
    // fair diceroll, or I'm just too lazy to figure
    // out how to use the rand crate right now
    2 
}

impl AsMut<[u8]> for BreakTheWorldBuffer {
    fn as_mut(&mut self) -> &mut [u8] {
        &mut self.0[..random_integer()]
    }
}

Across subsequent calls to as_mut? It provides no guarantees around that, like you mocked up.

If you have unsafe that relies on it, store the length and check it. If you don't, just document that panics or other unspecified-but-defined behavior will happen if you do silly things.

However AsMut<[u8]> may just be the wrong tool for the job. Do you really need to support any kind of backing buffer on top of Box<[u8]>, or do you just need to support &mut [u8]? See below.


impl Reader<R: Read, Buffer: AsMut<[u8]>> {
    fn new_with_buffer(reader: R, tmp_buffer: Buffer) -> Self { .. }
    fn new(reader: R) -> Self { Self::new_with_buffer(reader, Box::new(...)) }
}

That signature for Reader::new can't work - you're saying you can return any caller chosen buffer type but you're returning Reader<R, Box<[u8]>> instead.

Consider these instead:

impl<R: Read> Reader<R, Box<[u8]>> {
    fn new(reader: R) -> Self { ... }
    fn new_with_buffer_size(reader: R, size: usize) -> Self { ... }
    fn new_from_buffer<B: Into<Box[u8]>>(reader: R, buf: B) -> Self { ... }
}

impl<'buf, R: Read> Reader<R, &'buf mut [u8]> {
    fn new_with_buffer(reader: R, buffer: &'buf mut [u8]) -> Self { ... }
}

Sorry yes you're right. The two constructors are actually part of two separate impl blocks.

Actually yeah, that's probably the solution. Allow Box<[u8; _]> or &mut [u8], and nothing inbetween. I knew I was overthinking this. Thanks!