Processing I/O with BufRead

Still on my learning journey, I wanted to see how some bits of Rust’s I/O machinery work. From the documentation, I understand that one should use std::io::BufRead for buffered I/O, so I had a go. From what I understand, you’re supposed to call fill_buf() to ensure that the buffer has data, and you get a slice of an internal buffer from the call. After processing some bytes from this buffer, you’re supposed to call consume() to inform the BufRead that you’re done with those bytes, so you don’t see them again. Both of those are mutating operations.

So I came up with this:

use std::io::{self,BufRead};

pub enum ProtocolError {
    InvalidBytes(Vec<u8>),
    Io(io::Error),
}

pub struct Processor<B: BufRead> {
    reader: B
}

pub enum ProcessResult {
    Done,
    Error(ProtocolError),
    ProtocolValue(u32)
}

impl<B: BufRead> Processor<B> {
    pub fn new(r : B) -> Self {
        Self { reader: r }
    }

    pub fn process(&mut self) -> ProcessResult {
        let mut pos: usize = 0;
        let mut process_result: u32 = 0;

        loop {
            let reader = &mut self.reader;
            let r = reader.fill_buf();
            let length: usize;
            let buf : &[u8];

            match r {
                Ok(buffer) => {
                    buf = buffer;
                    length = buffer.len();
                    if length == 0 {
                        return ProcessResult::Done;
                    }
                },
                Err(e) => {
                    return ProcessResult::Error(ProtocolError::Io(e));
                }
            }
            let mut failed = false;
            while pos < length {
                let byte = buf[pos];

                process_result += (byte as u32) * 2 + 3;
                if process_result < 128 || process_result > 256 {
                    failed = true;
                    break;
                }
                pos += 1;
            }
            reader.consume(pos);
            if failed {
                let v = buf[0..pos].to_vec();
                ProcessResult::Error(ProtocolError::InvalidBytes(v));
            } else {
                ProcessResult::ProtocolValue(process_result);
            }
        }
    }
}

(Playground)

Of course, there are errors:

   Compiling playground v0.0.1 (file:///playground)
error[E0499]: cannot borrow `*reader` as mutable more than once at a time
  --> src/lib.rs:56:13
   |
29 |             let r = reader.fill_buf();
   |                     ------ first mutable borrow occurs here
...
56 |             reader.consume(pos);
   |             ^^^^^^ second mutable borrow occurs here
...
63 |         }
   |         - first borrow ends here

error: aborting due to previous error

For more information about this error, try `rustc --explain E0499`.
error: Could not compile `playground`.

To learn more, run the command again with --verbose.

I want to do two mutating operations in a single block, and the two operations fill_buf() and consume() are supposed to be used together. How, exactly? Is it just something dumb I’ve done with one of the declarations?

        loop {
            let reader = &mut self.reader;
            let r = reader.fill_buf();
            ...
            reader.consume();
        }

You can’t call consume() when you still have access to the buffer returned by fill_buf(). There is no way around that. You need to change your logic to drop the slice before calling consume().

Note that you are not required to use BufRead directly. You can just use std::io::Read, and BufReader implements that. This way, you’ll still get the benefits of buffered reading while using a more convenient API.

OK, that helped. Thanks.

OK, but then I have to make multiple calls to read() if I want a byte at a time, or else I have to have my own buffer to copy into - is that right?

Yes. For code like this where you need direct access to the buffer (so you can return it in the InvalidBytes case), it’s a little simpler to create your own buffer and read into it. Example.

BufRead and BufReader are mainly helpful if you don’t access the buffer directly but instead want buffering to happen transparently, behind the scenes.

1 Like

Thanks for the example. It’s fine if we’re doing all the processing in one hit, but if you have an outer loop that calls process() to do chunks of data at a time, then potentially all the data (at least, 4KB in the example) is swallowed up into the local buffer on the first call, and the rest gets thrown away after processing some bytes until hitting an error or getting a valid result. That intent wasn’t clear from the snippet I posted, but that’s why I wanted BufRead - somewhere to manage the buffer between calls to process().

Still, the pattern with Read() is useful to know.