Rust beginner notes & questions

My proposal is a lower "denominator" than the current Read trait. In fact, now that I think about it, I was wrong in my earlier statement that it can't be retrofitted into Rust because it's inherently incompatible with what's already there.

The exact opposite is true: It is a strict superset of std::io::Read, allowing it to implement the Read trait for the special case of u8. Meanwhile, the Read trait cannot implement the more elegant zero-copy trait, because:

  • It cannot read without consuming bytes.
  • It cannot read non-copy types even if generalised to a template trait with a default u8 parameter.
  • It breaks the performance contract of zero copy.

Lets call my proposal Read2:

trait Read2  {
    type Data; //  = u8; // with associated type defaults.
    type Error; // = (); // with associated type defaults.

    /// Returns at least 'items', which can be 0 for best-effort.
    fn peek(&mut self, items: usize ) -> Result<&[Self::Data],Self::Error>;

    /// Can consume any number of items, acting much like `skip()`.
    fn consume(&mut self, items: usize ) -> Result<(), Self::Error>;
}

// Ta-da: backwards-compatibility!
impl std::io::Read for Read2<Data=u8,Error=std::io::Error> {
    fn read(&mut self, buf: &mut [u8]) -> Result<usize, std::io::Error> {
        let read_items : usize;
        // Even with NLL this is required. Ugh!
        {
            let temp = self.peek( 0 )?;
            read_items  = temp.len();
            // THIS is the unavoidable copy inherent in all implementors
            // of std::io::Read.
            buf[..temp.len()].copy_from_slice( temp );
        }
        self.consume(read_items )?;
        Ok( read_items  )
    }

    fn read_exact(&mut self, buf: &mut [u8]) -> Result<(), std::io::Error> {
        // Directly calling buf.len() twice makes the borrow checker cry,
        // a temp copy of the len is required. Once again... Ugh!
        let request_items: usize = buf.len();
        buf.copy_from_slice( self.peek( request_items )? );
        self.consume( request_items )?;
        Ok( () )
    }
}

Now sit down for a second and picture how elegant it would be to implement memory mapped files using Read2 compared to Read. For example, in ripgrep, @BurntSushi had to write two complete implementations of the "searcher" struct, because std::io::Read would have been inefficient when using mmap:

The elegance of the System.IO.Pipeline model of "you get a reference to a buffer with at least 'x' items to peek" instead of "fill a buffer and now it's your problem" would mean that it's likely that the entirety of search_buffer.rs file from ripgrep could be deleted (another 400 lines alongside the BOM/UCS16 code which could be simplified using the peek API model). On top of that, I suspect that this chunk of rather complex code would also massively simplify, because you no longer have to worry about rolling over partially consumed buffers yourself:

I'm more impressed now at @BurntSushi's work, but I shouldn't be. He's reinventing wheels and necessarily duplicating code that ought to be reusing the same abstract trait for all implementations...