Improve my iterator

Can someone help me improve my simple iterator (this is a learning exercise, I'm sure there's something that exists that already does this). I'd like to get rid of the copying and recursion:

use std::str::Chars;

struct Cells<'a> {
    source: Chars<'a>,
    buffer: String,
}

impl<'a> Iterator for Cells<'a> {
    type Item = String;

    // This reads until the next delimiter (comma) into the internal buffer,
    // then copies the buffer and returns it. It uses recursion instead of
    // a for loop to continue pulling new chars from its source member until
    // a delimiter has been encountered, because my attempt at a for loop was
    // giving me "cannot move out of borrowed content" errors. Similarly,
    // I couldn't figure out how to return the buffer itself instead of a copy.
    // The intent is that the returned string is only valid until the next call
    // to `next()`; perhaps I should be returning a `str` instead?
    fn next(&mut self) -> Option<String> {
        match self.source.next() {
            Some(c) => {
                if c == ',' {
                    let tmp = self.buffer.clone();
                    self.buffer.clear();
                    return Some(tmp);
                }
                self.buffer.push(c);
                self.next()
            }
            None => {
                if self.buffer.len() > 0 {
                    let tmp = self.buffer.clone();
                    self.buffer.clear();
                    return Some(tmp);
                }
                None
            }
        }
    }
}

fn main() {
    let data = String::from("foo,bar,baz");
    let cells = Cells {
        source: data.chars(),
        buffer: String::new(),
    };
    for cell in cells {
        println!("{}", cell);
    }
}

This is what I want to write, but I'm getting several borrowing errors:

fn next(&mut self) -> Option<String> {
    self.buffer.clear();
    for c in self.source {
        if c == ',' {
            return Some(self.buffer);
        }
        self.buffer.push(c);
    }
    if self.buffer.len() > 0 {
        return Some(self.buffer);
    }
    None
}

Error output:

error[E0507]: cannot move out of borrowed content
  --> main.rs:21:18
   |
21 |         for c in self.source {
   |                  ^^^^ cannot move out of borrowed content

error[E0507]: cannot move out of borrowed content
  --> main.rs:23:29
   |
23 |                 return Some(self.buffer);
   |                             ^^^^ cannot move out of borrowed content

error[E0507]: cannot move out of borrowed content
  --> main.rs:28:25
   |
28 |             return Some(self.buffer);
   |                         ^^^^ cannot move out of borrowed content

error: aborting due to 3 previous errors

You can't accomplish that. The Iterator API allows all values from next() to stay live at the same time. But you declared you're returning a String, which is an owned value, so this would be fine. You just need to clone it from the buffer. Or probably the buffer isn't actually useful, so just build a local String and return that.

And this is trying to move your Chars iterator out of self and into the for loop. Try self.source.by_ref() to access it indirectly.

    fn next(&mut self) -> Option<String> {
        let mut buffer = String::new();
        for c in self.source.by_ref() {
            if c == ',' {
                return Some(buffer);
            }
            buffer.push(c);
        }
        if self.buffer.len() > 0 {
            return Some(buffer);
        }
        None
    }

Hmm, perhaps I don't want to implement the iterator exactly then? I would really like to avoid many allocations by reusing the buffer rather than allocating a new one with each pass. Thanks for tipping me off to by_ref()!

This code doesn't emit the final token because the is self.buffer.len() > 0 should be just buffer.len(). Cells::buffer can be removed altogether. Here's a corrected playground link.

I found out what I'm trying to do is called "streaming iterator", and it's not supported via the Iterator protocol, as you mentioned. Here's a substitute implementation:

use std::str::Chars;

struct Cells<'a> {
    source: Chars<'a>,
    buffer: String,
}

impl<'a> Cells<'a> {
    fn next_cell(&mut self) -> bool {
        self.buffer.clear();
        for c in self.source.by_ref() {
            if c == ',' {
                return true;
            }
            self.buffer.push(c);
        }
        self.buffer.len() > 0
    }
}

fn main() {
    let data = String::from("foo,bar,baz");
    let mut cells = Cells {
        source: data.chars(),
        buffer: String::new(),
    };
    while cells.next_cell() {
        println!("{}", cells.buffer);
    }
}

Ah, indeed, I was too hasty in my rewrite. :slight_smile:

I would still return the borrowed buffer, so callers don't have to access the internals of your Cells.

playground

use std::str::Chars;

struct Cells<'a> {
    source: Chars<'a>,
    buffer: String,
}

impl<'a> Cells<'a> {
    fn next_cell(&mut self) -> Option<&String> {
        self.buffer.clear();
        for c in self.source.by_ref() {
            if c == ',' {
                return Some(&self.buffer);
            }
            self.buffer.push(c);
        }
        if self.buffer.len() > 0 {
            Some(&self.buffer)
        } else {
            None
        }
    }
}

fn main() {
    let data = String::from("foo,bar,baz");
    let mut cells = Cells {
        source: data.chars(),
        buffer: String::new(),
    };
    while let Some(buffer) = cells.next_cell() {
        println!("{}", buffer);
    }
}

Note that this next_cell has an implicit lifetime passed from input to output, distinct (and shorter) than the outer Cells<'a>. It could be explicitly written like:

impl<'a> Cells<'a> {
    fn next_cell<'b>(&'b mut self) -> Option<&'b String> {
        /* ... */
    }
}

edit: Actually, it's more idiomatic if this returns Option<&str>. Nothing else needs to change for that.

Cool, those options do look more ergonomic. I also considered making the buffer a mutable pointer argument instead of a struct member; presumably this is less-idiomatic though?

You could do that, if you want the caller to have even greater control over buffer reuse. I would still return the borrowed value though, so it's clear how that gets set.

Actually, if you ditch the Chars and use the original underlying &str, then all you're really doing here is reimplementing str::split. Working from the actual &str buffer lets the Split return direct subslices, with no additional buffer required and no Iterator lifetime issues.

You could do that, if you want the caller to have even greater control over buffer reuse. I would still return the borrowed value though, so it's clear how that gets set.

I ended up taking your advice for a while, but I decided it was better to give the caller the additional flexibility (in particular, I want to allocate several different buffers--one per column in a CSV).

Actually, if you ditch the Chars and use the original underlying &str, then all you're really doing here is reimplementing str::split. Working from the actual &str buffer lets the Split return direct subslices, with no additional buffer required and no Iterator lifetime issues.

I'm actually using an Iterator<char> now; this gives me more flexibility regarding the source of my input (I/O vs memory). The Iterator approach isn't causing me lifetime issues, though I don't understand lifetimes well-enough to know that I'm doing the right thing; however, the compiler is happy and my simple examples are working.