I'm trying to build a struct that slowly accumulates lines read from a file, but also can return a reference to each line read via an iterator. The idea is that whoever the caller is can request the next line whenever it's needed and then and only then is the next line read. However, I want to store all the lines in a string for later use. I keep getting stuck on the lifetimes, though. Although my struct holds on to the string for 'a and the iterator returns a reference to the string with lifteim 'a, it seems like the reference is only referring to the iterator itself rather than the string. How can I fix this? This is what I have so far:
pub struct SourceHandler<R>
where
R: BufRead,
{
lines: Lines<R>,
source: String,
}
impl<R> SourceHandler<R>
where
R: BufRead,
{
pub fn new(lines: Lines<R>) -> Self {
Self {
lines,
source: String::new(),
}
}
}
pub struct SourceHandlerIter<'a, R>
where
R: BufRead,
{
source_handler: &'a mut SourceHandler<R>,
}
impl<'a, R> Iterator for SourceHandlerIter<'a, R>
where
R: BufRead,
{
type Item = Result<&'a str, io::Error>;
fn next(&mut self) -> Option<Self::Item> {
let line_start = self.source_handler.source.len();
let next_line = self.source_handler.lines.next();
match next_line {
Some(line) => match line {
Ok(line) => {
self.source_handler.source.push_str(&line);
Some(Ok(&self.source_handler.source[line_start..]))
}
Err(err) => Some(Err(err)),
},
None => None,
}
}
}
An instance of an iterator should outlive the references returned for each of its items during the iteration (i.e. everytime that the next method is called), but you are giving those two concepts the same lifetime 'a:
impl<'a, R> Iterator for SourceHandlerIter<'a, R>
where
R: BufRead,
{
type Item = Result<&'a str, io::Error>;
Storing references within structs is a common anti-pattern in Rust. If you don't feel comfortable working with the concept of lifetimes, I recommend you not to store reference types, and instead store owned types (i.e. &str vs String).
That's what I don't understand. I'm storing a String, not a reference. I'm only returning a reference to the owned String. Or that's what I'm trying to do.
The pattern is called lending iterator, and the solution in stable Rust is via GAT:
use ::lending_iterator::prelude::*;
use std::io::{BufRead, Result};
pub struct SourceHandler<R>
where
R: BufRead,
{
reader: R,
source: String,
}
impl<R> SourceHandler<R>
where
R: BufRead,
{
pub fn new(reader: R) -> Self {
Self {
reader,
source: String::new(),
}
}
}
pub struct SourceHandlerIter<'a, R>
where
R: BufRead,
{
source_handler: &'a mut SourceHandler<R>,
}
#[gat]
impl<'a, R> LendingIterator for SourceHandlerIter<'a, R>
where
R: BufRead,
{
type Item<'next> = Result<&'next str> where Self: 'next;
fn next(&mut self) -> Option<Result<&str>> {
let source = &mut self.source_handler.source;
let line_start = source.len();
match self.source_handler.reader.read_line(source) {
Ok(len) if len == 0 => None,
Err(err) => Some(Err(err)),
_ => Some(Ok(source.split_at(line_start).1)),
}
}
}
fn main() {
let s = b"aaa
bbb
ccc";
let mut source = SourceHandler::new(s.as_slice());
let iter = SourceHandlerIter {
source_handler: &mut source,
};
iter.for_each(|s| println!("{s:?}"));
dbg!(source.source);
}
// output:
//Ok("aaa\n")
//Ok(" bbb\n")
//Ok(" ccc")
//[src/main.rs:58] source.source = "aaa\n bbb\n ccc"
Update: Note in this way, you're using a separate trait instead of the iterator in std, because the lending pattern is incompatible. For the lending iterator pattern, I'd recommend the blog post written last month by Niko: Giving, lending, and async closures .
Since you are unconditionally allocating a new String upon each iteration anyway, there's nothing to be gained from returning a reference as the iterator item, instead of just returning the freshly-read String. Just return line by-value after having appended it to the buffer.
From a compiler point of view, it's as if through self you have access to a &'self mut &'a mut String (where 'self is shorter than 'a), which however you can only reborrow to get a &'self String, but not a &'a String.
From a more practical point of view:
The push_str may reallocate the backing buffer of source, invalidating every outstanding reference to it. However in the contract of Iterator there's nothing preventing the caller from calling next again while holding a reference to a previous line. Thus if this was allowed you would end up with a use-after-free bug.