Split Iterator into Iterator of Iterators

I am trying to create an iterator of iterators where the outer iterator splits the input sequence and each inner iterator iterates over a split.

The interface should looks something like this:

let vec = vec![1, 2, 3, 10, 11, 12, 50, 120, 121, 122];
let splits = vec.iter().split_by(|&a, &b| a + 1 != b);

for split in splits {
    println!("split = {:?}", split.collect::<Vec<_>>());
}

with an output of:

split = [1, 2, 3]
split = [10, 11, 12]
split = [50]
split = [120, 121, 122]

Since I need to remember the previous element of the iteration (to pass the “a” to the closure) the items need to be Copy. Alternatively, a key-closure provides a Copy value for an item, like so:

let vec = vec!["abc", "run", "tag", "go", "be", "ring", "zip"];
let splits = vec.iter().split_by_key(|x| x.len(), |&a, &b| a != b);

for split in splits {
    println!("split = {:?}", split.collect::<Vec<_>>());
}

with an output of:

split = ["abc", "run", "tag"]
split = ["go", "be"]
split = ["ring"]
split = ["zip"]

The following shows my current work. I am stuck at how to manage the lifetimes required such that the inner iterator Split can advance the parent iterator managed by the outer iterator SplitBy:

struct SplitBy<I, K, C, F>
where
    I: Iterator,
    K: Fn(&I::Item) -> C,
    C: Copy,
    F: Fn(C, C) -> bool,
{
    iter: Peekable<I>,
    make_key: K,
    is_split: F,
}

impl<I, K, C, F> Iterator for SplitBy<I, K, C, F>
where
    I: Iterator,
    K: Fn(&I::Item) -> C,
    C: Copy,
    F: Fn(C, C) -> bool,
{
    type Item = Split<'a, I, K, C, F>;

    fn next(&mut self) -> Option<Self::Item> {
        if let Some(first) = self.iter.peek() {
            Some(Split {
                parent: self, // <- this is the part I am most unsure about
                prev: (self.make_key)(first),
                is_init: true,
            })
        } else {
            None
        }
    }
}

struct Split<'a, I, K, C, F>
where
    I: Iterator,
    K: Fn(&I::Item) -> C,
    C: Copy,
    F: Fn(C, C) -> bool,
{
    parent: &'a mut SplitBy<I, K, C, F>,
    prev: C,
    is_init: bool,
}

impl<'a, I, K, C, F> Iterator for Split<'a, I, K, C, F>
where
    I: Iterator,
    K: Fn(&I::Item) -> C,
    C: Copy,
    F: Fn(C, C) -> bool,
{
    type Item = I::Item;

    fn next(&mut self) -> Option<I::Item> {
        if self.is_init {
            return self.parent.iter.next();
        } else {
            if let Some(peek) = self.parent.iter.peek() {
                let peek_id = (self.parent.make_key)(peek);

                if (self.parent.is_split)(self.prev, peek_id) {
                    return None;
                } else {
                    return self.parent.iter.next();
                }
            }
        }

        None
    }
}

The code above is largely inspired by GroupBy and Group from the Itertools crate.

The code in Split::next is wrong, but I am able to fix that once Split can communicate with SplitBy. Does someone have a pointer what I am missing?

I'm not sure exactly what the issue you're struggling with now is, but you seem to have overlooked an important aspect of the design of GroupBy:

This type implements IntoIterator (it is not an iterator itself), because the group iterators need to borrow from this value. It should be stored in a local variable or temporary and iterated.

(Actually this is not strictly correct -- GroupBy does not implement IntoIterator, instead &'a GroupBy does because that's how 'a gets into the signature of Group.)

Implementing Iterator for SplitBy is incompatible with the desired iterator behavior.

1 Like

I will try to replicate this technique for SplitBy. Thank you for your help!

I think you can make this work by using Rc<RefCell> to share the “parent” iterator, rather than references. Playground

(I also needed to fix some minor bugs in Split::next.)

However, note that this requires the caller to fully consume each child iterator before getting the next one from the parent iterator. If they do not, their code will compile but they will get incorrect results. itertools::GroupBy detects this situation and buffers items internally to prevent this.

Alternately, instead of implementing Iterator::next, you could have the parent’s next method keep it borrowed while the child iterator is alive. This would prevent out-of-order iteration at compile time, and you could go back to using references instead of Rc, but it would no longer implement the standard Iterator trait.

2 Likes

Thank you very much for the playground! This looks great.

That is fine for my use cases. A general solution should probably incorporate the same trick GroupBy uses. I am still new to implementing Iterators and working with Rc, so I will see if that makes sense for my codebase.

For comparison, here's the implementation using only references, which doesn't implement the standard Iterator::next signature:

Playground

1 Like

For slices (and anything that derefs to a slice, like arrays and vectors) you could take inspiration from the unstable slice::group_by

1 Like

Good to know! While not applicable to my current setup, it will surely come in handy in the future. Also, I found the discussion on the RFC insightful in terms of naming and API-design.