Improve my iterator


#1

Can someone help me improve my simple iterator (this is a learning exercise, I’m sure there’s something that exists that already does this). I’d like to get rid of the copying and recursion:

use std::str::Chars;

struct Cells<'a> {
    source: Chars<'a>,
    buffer: String,
}

impl<'a> Iterator for Cells<'a> {
    type Item = String;

    // This reads until the next delimiter (comma) into the internal buffer,
    // then copies the buffer and returns it. It uses recursion instead of
    // a for loop to continue pulling new chars from its source member until
    // a delimiter has been encountered, because my attempt at a for loop was
    // giving me "cannot move out of borrowed content" errors. Similarly,
    // I couldn't figure out how to return the buffer itself instead of a copy.
    // The intent is that the returned string is only valid until the next call
    // to `next()`; perhaps I should be returning a `str` instead?
    fn next(&mut self) -> Option<String> {
        match self.source.next() {
            Some(c) => {
                if c == ',' {
                    let tmp = self.buffer.clone();
                    self.buffer.clear();
                    return Some(tmp);
                }
                self.buffer.push(c);
                self.next()
            }
            None => {
                if self.buffer.len() > 0 {
                    let tmp = self.buffer.clone();
                    self.buffer.clear();
                    return Some(tmp);
                }
                None
            }
        }
    }
}

fn main() {
    let data = String::from("foo,bar,baz");
    let cells = Cells {
        source: data.chars(),
        buffer: String::new(),
    };
    for cell in cells {
        println!("{}", cell);
    }
}

This is what I want to write, but I’m getting several borrowing errors:

fn next(&mut self) -> Option<String> {
    self.buffer.clear();
    for c in self.source {
        if c == ',' {
            return Some(self.buffer);
        }
        self.buffer.push(c);
    }
    if self.buffer.len() > 0 {
        return Some(self.buffer);
    }
    None
}

Error output:

error[E0507]: cannot move out of borrowed content
  --> main.rs:21:18
   |
21 |         for c in self.source {
   |                  ^^^^ cannot move out of borrowed content

error[E0507]: cannot move out of borrowed content
  --> main.rs:23:29
   |
23 |                 return Some(self.buffer);
   |                             ^^^^ cannot move out of borrowed content

error[E0507]: cannot move out of borrowed content
  --> main.rs:28:25
   |
28 |             return Some(self.buffer);
   |                         ^^^^ cannot move out of borrowed content

error: aborting due to 3 previous errors

#2

You can’t accomplish that. The Iterator API allows all values from next() to stay live at the same time. But you declared you’re returning a String, which is an owned value, so this would be fine. You just need to clone it from the buffer. Or probably the buffer isn’t actually useful, so just build a local String and return that.

And this is trying to move your Chars iterator out of self and into the for loop. Try self.source.by_ref() to access it indirectly.

    fn next(&mut self) -> Option<String> {
        let mut buffer = String::new();
        for c in self.source.by_ref() {
            if c == ',' {
                return Some(buffer);
            }
            buffer.push(c);
        }
        if self.buffer.len() > 0 {
            return Some(buffer);
        }
        None
    }

#3

Hmm, perhaps I don’t want to implement the iterator exactly then? I would really like to avoid many allocations by reusing the buffer rather than allocating a new one with each pass. Thanks for tipping me off to by_ref()!


#4

This code doesn’t emit the final token because the is self.buffer.len() > 0 should be just buffer.len(). Cells::buffer can be removed altogether. Here’s a corrected playground link.


#5

I found out what I’m trying to do is called “streaming iterator”, and it’s not supported via the Iterator protocol, as you mentioned. Here’s a substitute implementation:

use std::str::Chars;

struct Cells<'a> {
    source: Chars<'a>,
    buffer: String,
}

impl<'a> Cells<'a> {
    fn next_cell(&mut self) -> bool {
        self.buffer.clear();
        for c in self.source.by_ref() {
            if c == ',' {
                return true;
            }
            self.buffer.push(c);
        }
        self.buffer.len() > 0
    }
}

fn main() {
    let data = String::from("foo,bar,baz");
    let mut cells = Cells {
        source: data.chars(),
        buffer: String::new(),
    };
    while cells.next_cell() {
        println!("{}", cells.buffer);
    }
}

#6

Ah, indeed, I was too hasty in my rewrite. :slight_smile:


#7

I would still return the borrowed buffer, so callers don’t have to access the internals of your Cells.

playground

use std::str::Chars;

struct Cells<'a> {
    source: Chars<'a>,
    buffer: String,
}

impl<'a> Cells<'a> {
    fn next_cell(&mut self) -> Option<&String> {
        self.buffer.clear();
        for c in self.source.by_ref() {
            if c == ',' {
                return Some(&self.buffer);
            }
            self.buffer.push(c);
        }
        if self.buffer.len() > 0 {
            Some(&self.buffer)
        } else {
            None
        }
    }
}

fn main() {
    let data = String::from("foo,bar,baz");
    let mut cells = Cells {
        source: data.chars(),
        buffer: String::new(),
    };
    while let Some(buffer) = cells.next_cell() {
        println!("{}", buffer);
    }
}

Note that this next_cell has an implicit lifetime passed from input to output, distinct (and shorter) than the outer Cells<'a>. It could be explicitly written like:

impl<'a> Cells<'a> {
    fn next_cell<'b>(&'b mut self) -> Option<&'b String> {
        /* ... */
    }
}

edit: Actually, it’s more idiomatic if this returns Option<&str>. Nothing else needs to change for that.


#8

Cool, those options do look more ergonomic. I also considered making the buffer a mutable pointer argument instead of a struct member; presumably this is less-idiomatic though?


#9

You could do that, if you want the caller to have even greater control over buffer reuse. I would still return the borrowed value though, so it’s clear how that gets set.

Actually, if you ditch the Chars and use the original underlying &str, then all you’re really doing here is reimplementing str::split. Working from the actual &str buffer lets the Split return direct subslices, with no additional buffer required and no Iterator lifetime issues.


#10

You could do that, if you want the caller to have even greater control over buffer reuse. I would still return the borrowed value though, so it’s clear how that gets set.

I ended up taking your advice for a while, but I decided it was better to give the caller the additional flexibility (in particular, I want to allocate several different buffers–one per column in a CSV).

Actually, if you ditch the Chars and use the original underlying &str, then all you’re really doing here is reimplementing str::split. Working from the actual &str buffer lets the Split return direct subslices, with no additional buffer required and no Iterator lifetime issues.

I’m actually using an Iterator<char> now; this gives me more flexibility regarding the source of my input (I/O vs memory). The Iterator approach isn’t causing me lifetime issues, though I don’t understand lifetimes well-enough to know that I’m doing the right thing; however, the compiler is happy and my simple examples are working.