"conflicting requirements" what?


#1

So discovered BufRead.read_line(), but need an iterator of &str, want to avoid the overhead of allocating a new string on every iteration of BufRead.lines().

Thought I could make a “simple” iterator wrapper for BufRead:

struct BufLines<'a> {
    src: &'a mut BufRead,
    buf: String,
    err: Option<io::Error>,
}

impl<'a> Iterator for BufLines<'a> {
    type Item = &'a str;

    fn next(&mut self) -> Option<&'a str> {
        self.buf.clear();
        let r = self.src.read_line(&mut self.buf);
        match r { 
            Ok(bytes) => {
                match bytes {
                    0 => None,
                    _ => Some(&self.buf),
                }   
            }   
            Err(e) => {
                self.err = Some(e);
                None
            }
        }
    }   
}

However, the compiler disagrees:

error[E0495]: cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
   --> manifest/mod.rs:145:14
    |
145 |         Some(&self.buf)
    |              ^^^^^^^^^
    |
help: consider using an explicit lifetime parameter as shown: fn get(&'a self) -> Option<&'a str>
   --> manifest/mod.rs:144:5
    |
144 |     fn get(&self) -> Option<&'a str> {
    |     ^

I don’t know what the conflicting requirements are here since the compiler won’t tell me, and the advice it provides only leads to more confusion.

Is there another way to create this iterator that I’ve missed? Is there already one in the standard library?

Ideas for a workaround?

Thanks.


#2

There is in fact an iterator in the standard library already: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines

More information about why this doesn’t work below.


You declare (through the 'a lifetime) that the string returned by the iterator lives as long as the BufRead type. But the string only lives as long as the borrow of self.src.read_line.

The iterator interface allows for things like this:

// first element
let x = iter.next();
// second element
let y = iter.next();

If you were to do that with this iterator, though, it would be a memory error. The second call would clear the buffer and overwrite the data in it while the reference to that buffer from the first call still existed.

So you need to copy the data out of the buffer into a new String, that way you can return it. This is why the lines iterator I linked to above returns a String instead of a &str.


#3

I think what’s needed here is to associate the lifetime of the returned &str with the &self borrow. So you couldn’t call next() the 2nd time until “x” is gone because you couldn’t borrow mut while it’s still borrow immutably. Then you could return slices instead of owned Strings.


#4

You can’t. That requires “streaming iterators”, which can’t be defined safely and completely in Rust without additional language features that (last I knew) aren’t coming any time soon.


#5

Yes, sorry - I should have said what’s “desired”, rather than needed. I think that’s the concept that @binarycrusader wanted to express.


#6

It’s possible to do this if you don’t use the Iterator trait. Playpen

struct Iterfaker {
    buffer: String,
    counter: i32
}

impl Iterfaker {
    fn next(&mut self) -> Option<&str> {
        self.buffer = format!("{}", self.counter);
        self.counter += 1;
        if self.counter < 10 {
            Some(&self.buffer)
        } else {
            None
        }
    }
}

fn main() {
    let mut it = Iterfaker { buffer: String::new(), counter: 0 };
    while let Some(s) = it.next() {
        println!("{:?}", s);
    }
}

You’re forced to use an iteration method that doesn’t allow multiple items to exist at once, like that while let loop, so you don’t have access to most iterator adapters. It still works fine for imperative code.


#7

I’m aware of that iterator and had already mentioned it, but it has a serious drawback:

the overhead of allocating a new string on every iteration of BufRead.lines().

Except that’s exactly what I’m trying to avoid, and I don’t understand why I can’t do it since the compiler won’t explain what the conflicting requirements are.


#8

Yes, in so many words, that’s exactly what I was trying to figure out how to accomplish.


#9

Unfortunately, that’s exactly what I was trying to do. I have a function that does some accumulation of string slices that are presumably returned by an iterator like lines().

The str.lines() iterator does exactly what I need – returns slices. But the ReadBuf.lines() iterator returns Strings forcing an allocation on every iteration.


#10

So I have a dumb question; why is it that str.lines() is able to return string slices while ReadBuf.lines() is not?


#11

With str.lines(), the entire string is in memory at once, so you can give out pieces of it. With BufRead.lines(), the file you’re reading is being pulled into and out of memory while you read it, so any piece you give out will be invalidated when you read more.

Just look at your original implementation. You’re returning Some(&self.buf) when you call next. The next time you call next, self.buf is overwritten by self.src.read_line(&mut self.buf), making what you returned earlier invalid. The Iterator trait requires that the thing you return be valid as long as the thing you’re iterating over is, so future calls to next may not affect the things returned by previous calls.


#12

I get that, but why isn’t there a way to express “the lifetime of the thing I return is only valid until the next iteration or destruction of the Iterator, whichever comes first”.

Seems like I might be able to fool the compiler here by using a Rc or the like.


#13

The lifetime is part of the function signature. The signature of Iterator::next is fn next(&mut self) -> Option<Self::Item>, which draws no connection between the lifetime of self and the lifetime of the return value. Therefore, no implementation of Iterator can rely on Self::Item being required to be dropped before the next call to next.

The only way to “fool the compiler” is to make sure that the buffer isn’t invalidated during iteration. There are a few ways to do that.

  1. You can use an Rc<String> as your buffer, and return an Rc<String> from next. Then, when you call next again, you can check whether or not the consumer is still keeping their reference around with the get_mut method. You can re-use the String if it lets you, or create a new one if it doesn’t.

  2. You can return a String every time, like BufRead does.

  3. You can read the entire file into a single String, and give out slices of it.

  4. You can use a function that has the desired signature, instead of the one from the Iterator trait.


#14

If you’re reading a file which fits into memory, you can use read_to_string which will allocate one string and then use that string’s lines iterator.


#15

That seems like an unfortunate limitation of the current lifetime system; hopefully this will be improved at some point.

Ah, yes, that’s exactly the sort of mechanism I was thinking of, but I didn’t know how to do it; I’ve intentionally avoided using Rc, so didn’t think of it.

There may be another alternative for my case though, so I will ponder this some more.

No, the whole point was to avoid unnecessary allocation.

Also undesirable, since the files I’m processing may be hundreds of megabytes.

I get that, but again, iterator is what was desired here. This is an existing project that’s quite large that was originally written in Python, and so the project uses Python’s generators extensively. That’s requiring me to significantly re-think things as I go along since rust’s iterator support is not as expressive yet.

I do appreciate you taking the time to offer alternatives though.


#16

The size is effectively unbounded, so that’s unfortunately not a realistic choice. But thanks for trying to help.


#17

Seems like I’m not the only one to encounter this:


#18

Certainly not – the desire for streaming iterators comes up frequently. Associated Type Constructors in RFC1598 might be the way forward.


#19

You can express a streaming iterator (in a somewhat limited form) right now: https://crates.io/crates/streaming-iterator.


#20

In Rust closures are a way to lend memory temporarily. You could model your iterator to be like the map() iterator, and pass the borrowed string fragment to a closure.

e.g.

for item in lines.line_map(|line| line.parse()) {}