My proposal is a lower "denominator" than the current Read
trait. In fact, now that I think about it, I was wrong in my earlier statement that it can't be retrofitted into Rust because it's inherently incompatible with what's already there.
The exact opposite is true: It is a strict superset of std::io::Read
, allowing it to implement the Read
trait for the special case of u8
. Meanwhile, the Read trait cannot implement the more elegant zero-copy trait, because:
- It cannot read without consuming bytes.
- It cannot read non-copy types even if generalised to a template trait with a default
u8
parameter. - It breaks the performance contract of zero copy.
Lets call my proposal Read2
:
trait Read2 {
type Data; // = u8; // with associated type defaults.
type Error; // = (); // with associated type defaults.
/// Returns at least 'items', which can be 0 for best-effort.
fn peek(&mut self, items: usize ) -> Result<&[Self::Data],Self::Error>;
/// Can consume any number of items, acting much like `skip()`.
fn consume(&mut self, items: usize ) -> Result<(), Self::Error>;
}
// Ta-da: backwards-compatibility!
impl std::io::Read for Read2<Data=u8,Error=std::io::Error> {
fn read(&mut self, buf: &mut [u8]) -> Result<usize, std::io::Error> {
let read_items : usize;
// Even with NLL this is required. Ugh!
{
let temp = self.peek( 0 )?;
read_items = temp.len();
// THIS is the unavoidable copy inherent in all implementors
// of std::io::Read.
buf[..temp.len()].copy_from_slice( temp );
}
self.consume(read_items )?;
Ok( read_items )
}
fn read_exact(&mut self, buf: &mut [u8]) -> Result<(), std::io::Error> {
// Directly calling buf.len() twice makes the borrow checker cry,
// a temp copy of the len is required. Once again... Ugh!
let request_items: usize = buf.len();
buf.copy_from_slice( self.peek( request_items )? );
self.consume( request_items )?;
Ok( () )
}
}
Now sit down for a second and picture how elegant it would be to implement memory mapped files using Read2
compared to Read
. For example, in ripgrep, @BurntSushi had to write two complete implementations of the "searcher" struct, because std::io::Read
would have been inefficient when using mmap:
The elegance of the System.IO.Pipeline
model of "you get a reference to a buffer with at least 'x' items to peek" instead of "fill a buffer and now it's your problem" would mean that it's likely that the entirety of search_buffer.rs file from ripgrep could be deleted (another 400 lines alongside the BOM/UCS16 code which could be simplified using the peek API model). On top of that, I suspect that this chunk of rather complex code would also massively simplify, because you no longer have to worry about rolling over partially consumed buffers yourself:
I'm more impressed now at @BurntSushi's work, but I shouldn't be. He's reinventing wheels and necessarily duplicating code that ought to be reusing the same abstract trait for all implementations...