Rust beginner notes & questions

peter_bertok · July 4, 2018, 12:49pm

My proposal is a lower "denominator" than the current Read trait. In fact, now that I think about it, I was wrong in my earlier statement that it can't be retrofitted into Rust because it's inherently incompatible with what's already there.

The exact opposite is true: It is a strict superset of std::io::Read, allowing it to implement the Read trait for the special case of u8. Meanwhile, the Read trait cannot implement the more elegant zero-copy trait, because:

It cannot read without consuming bytes.
It cannot read non-copy types even if generalised to a template trait with a default u8 parameter.
It breaks the performance contract of zero copy.

Lets call my proposal Read2:

trait Read2  {
    type Data; //  = u8; // with associated type defaults.
    type Error; // = (); // with associated type defaults.

    /// Returns at least 'items', which can be 0 for best-effort.
    fn peek(&mut self, items: usize ) -> Result<&[Self::Data],Self::Error>;

    /// Can consume any number of items, acting much like `skip()`.
    fn consume(&mut self, items: usize ) -> Result<(), Self::Error>;
}

// Ta-da: backwards-compatibility!
impl std::io::Read for Read2<Data=u8,Error=std::io::Error> {
    fn read(&mut self, buf: &mut [u8]) -> Result<usize, std::io::Error> {
        let read_items : usize;
        // Even with NLL this is required. Ugh!
        {
            let temp = self.peek( 0 )?;
            read_items  = temp.len();
            // THIS is the unavoidable copy inherent in all implementors
            // of std::io::Read.
            buf[..temp.len()].copy_from_slice( temp );
        }
        self.consume(read_items )?;
        Ok( read_items  )
    }

    fn read_exact(&mut self, buf: &mut [u8]) -> Result<(), std::io::Error> {
        // Directly calling buf.len() twice makes the borrow checker cry,
        // a temp copy of the len is required. Once again... Ugh!
        let request_items: usize = buf.len();
        buf.copy_from_slice( self.peek( request_items )? );
        self.consume( request_items )?;
        Ok( () )
    }
}

Now sit down for a second and picture how elegant it would be to implement memory mapped files using Read2 compared to Read. For example, in ripgrep, @BurntSushi had to write two complete implementations of the "searcher" struct, because std::io::Read would have been inefficient when using mmap:

github.com

BurntSushi/ripgrep/blob/7120f3225862f6c718a37a8616debaebd8c3d459/src/worker.rs#L275-L279


      
          if self.opts.mmap {
              self.search_mmap(printer, path, &file)
          } else {
              self.search(printer, path, file)
          }

The elegance of the System.IO.Pipeline model of "you get a reference to a buffer with at least 'x' items to peek" instead of "fill a buffer and now it's your problem" would mean that it's likely that the entirety of search_buffer.rs file from ripgrep could be deleted (another 400 lines alongside the BOM/UCS16 code which could be simplified using the peek API model). On top of that, I suspect that this chunk of rather complex code would also massively simplify, because you no longer have to worry about rolling over partially consumed buffers yourself:

github.com

BurntSushi/ripgrep/blob/7120f3225862f6c718a37a8616debaebd8c3d459/src/search_stream.rs#L625-L677


      
          fn fill<R: io::Read>(
              &mut self,
              rdr: &mut R,
              keep_from: usize,
          ) -> Result<bool, io::Error> {
              // Rollover bytes from buf[keep_from..end] and update our various
              // pointers. N.B. This could be done with the ptr::copy, but I haven't
              // been able to produce a benchmark that notices a difference in
              // performance. (Invariably, ptr::copy is seems clearer IMO, but it is
              // not safe.)
              self.tmp.clear();
              self.tmp.extend_from_slice(&self.buf[keep_from..self.end]);
              self.buf[0..self.tmp.len()].copy_from_slice(&self.tmp);
              self.pos = self.lastnl - keep_from;
              self.lastnl = 0;
              self.end = self.tmp.len();
              while self.lastnl == 0 {
                  // If our buffer isn't big enough to hold the contents of a full
                  // read, expand it.
                  if self.buf.len() - self.end < self.read_size {

This file has been truncated. show original

I'm more impressed now at @BurntSushi's work, but I shouldn't be. He's reinventing wheels and necessarily duplicating code that ought to be reusing the same abstract trait for all implementations...

Topic		Replies	Views
Access iterator in for loop; refactorings not working; mutable referance to non-mutable object; sharing large base class	1	785	January 12, 2023
For loops in Rust	6	2235	January 12, 2023
Need help with iterators help	5	2505	November 29, 2020
What does it mean that iter is mut?	9	481	February 13, 2021
Passing a iterator recursively help	5	3186	January 12, 2023

Rust beginner notes & questions

Related Topics