Mutable borrow in a loop without retaining any references

Hey everyone!

I'm trying to implement line reader that doesn't do any allocations using a fixed buffer size.

I don't really understand why borrow checker complains since line is not retained after the iteration, but was just copied.

It doesn't even work for simple usage like this for some reason:

    loop {
        if matches!(reader.next_line(), None) {
            break;
        }
    }

I've read all the posts I could find here and stackoverflow on mutable borrow in a loop,
but still can't figure it out. Help is appreciated.

use memchr::memchr;
use std::io::Read;
use std::marker::PhantomData;

/// Reads lines from the stream using pre-allocated buffer.
/// Lines shorter than buffer will be returned as-is, lines longer
/// than buffer will be truncated.
struct LineReader<'a, R: Read> {
    input: R,
    buffer: &'a mut [u8],
    read_start: usize,
    read_end: usize,
    phantom: PhantomData<&'a str>,
}

impl<'a, R: Read> LineReader<'a, R> {
    fn new(input: R, buffer: &'a mut [u8]) -> Self {
        LineReader {
            input,
            buffer,
            read_start: 0,
            read_end: 0,
            phantom: PhantomData,
        }
    }

    fn next_line(&'a mut self) -> Option<&'a [u8]> {
        if self.read_start == self.read_end {
            let Ok(n) = self.input.read(&mut self.buffer) else {
                return None;
            };

            self.read_start = 0;
            self.read_end = n;
        }

        if let Some(index) = memchr(b'\n', &self.buffer[self.read_start..self.read_end]) {
            let slice: &'a [u8] = &self.buffer[self.read_start..index];
            self.read_start = index;
            dbg!(&self.buffer[self.read_start..index]);
            return Some(slice);
        } else {
            todo!(
                "read until \\n from {} {} {:?}",
                self.read_start,
                self.read_end,
                self.buffer
            );
        }
    }
}

#[cfg(test)]
mod tests {

    use crate::LineReader;
    use std::io::Cursor;

    #[test]
    fn test_line_reader() {
        let mut buffer = [0u8; 10];
        let mut reader =
            LineReader::new(Cursor::new("01234\n56789\n01234\n5678\n9\n"), &mut buffer);
        let mut lines = vec![];

        loop {
            if let Some(line) = reader.next_line() {
                lines.push(String::from_utf8_lossy(line).into_owned())
            } else {
                break;
            }
        }

        assert_eq!(&["abc".to_owned()], lines.as_slice());
    }
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
warning: struct `LineReader` is never constructed
 --> src/lib.rs:8:8
  |
8 | struct LineReader<'a, R: Read> {
  |        ^^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: associated items `new` and `next_line` are never used
  --> src/lib.rs:17:8
   |
16 | impl<'a, R: Read> LineReader<'a, R> {
   | ----------------------------------- associated items in this implementation
17 |     fn new(input: R, buffer: &'a mut [u8]) -> Self {
   |        ^^^
...
27 |     fn next_line(&'a mut self) -> Option<&'a [u8]> {
   |        ^^^^^^^^^

warning: `playground` (lib) generated 2 warnings
error[E0499]: cannot borrow `reader` as mutable more than once at a time
  --> src/lib.rs:67:29
   |
67 |         if let Some(line) = reader.next_line() {
   |                             ^^^^^^
   |                             |
   |                             `reader` was mutably borrowed here in the previous iteration of the loop
   |                             first borrow used here, in later iteration of loop

For more information about this error, try `rustc --explain E0499`.
error: could not compile `playground` (lib test) due to 1 previous error

The problem starts here

next_line(&'a mut self) -> Option<&'a [u8]> {

where on one hand, the returned slice has a lifetime tied to a mutable borrow as input, so you can never keep around the result while making any subsequent calls to next_line,
and on the other hand – arguably even worse – the self type here is &'a mut LineReader<'a, R>, featuring a mutable borrow of the same lifetime as the structs lifetime argument; a significant anti-pattern, since functions with such a signature are almost unusable.

As possible solutions, we could probably consider ways to change this signature, along with the implementation - ideally you'd probably want

next_line(&mut self) -> Option<&'a [u8]> {

with a short self-borrow, but a returned slice tied to the lifetime parameter. But this would require some change to the code to make sure the buffer: &'a mut [u8] field of your struct is itself updated to remove the portions handed out in next_line calls, so the compiler can be convinced there's no possibility for aliased mutable access.

From a quick glance at the code it seems the buffer is refilled when it's completly read, so OP will likely have to drop the 'a lifetime from the return type.

3 Likes

Yes, I also just noticed that the buffer is actually re-used by the logic in the code. It seems OP would need to change that, or copy the data somewhere more persistent (like an owned Vec<u8 or so), if the goal is to collect multiple return-values of multiple next_line calls next or each other in a Vec. The borrow checker prevented an actual logic error here.

For the latter, the copying to Vec could be left to the user, while the signature can then just become generic over the self-borrow lifetime, as you mentioned, by removing the explicit 'a in argument and return types (which under elision is equivalent to introducing a new lifetime parameter 'b and using it for &'b mut self and the Option<&'b [u8]> return type).

Edit: Testing goes over theorizing, and @quinedot did the testing (of what needs to change minimally to make everything compile) - I had missed the already extant String::from_utf8_lossy conversion..

This compiles.

The main changes are

-    fn next_line(&'a mut self) -> Option<&'a [u8]> {
+    fn next_line(&mut self) -> Option<&[u8]> {
         if let Some(index) = memchr(b'\n', &self.buffer[self.read_start..self.read_end]) {
-            let slice: &'a [u8] = &self.buffer[self.read_start..index];
-            self.read_start = index;
+            let slice: &[u8] = &self.buffer[self.read_start..self.read_start + index + 1];
+            self.read_start += index + 1;
             return Some(slice);

I.e. removing 'a from the implementation body, and correcting some indexing logic (untested in depth).

2 Likes

Yes, thanks, removing 'a from next_line helps. I kept using it since my buffer has one, and that was a mistake. I initially started with LineReader fully owning the buffer but there were lots of complications there either.

Please disregard off-by-1 problems and unfinished implementation. I was banging my head all day against the borrow checker and didn't have time to get it right.

Thanks one more time.

&'a mut self is a terrible footgun. Unfortunately common enough that I can recycle my answer:

@ekuber Can rustc tell people not to write &'a mut self with a lifetime from a trait?

3 Likes

When 'a is part of Self or similar, at least. :+1:

3 Likes

That should be possible. If we can have a couple of repro cases and the desired output, we can take it from there.

1 Like

Here's some examples and some potential output. You or others are free to use or build on it.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.