Split Iterator into Iterator of Iterators

florianpircher · January 17, 2021, 2:23pm

I am trying to create an iterator of iterators where the outer iterator splits the input sequence and each inner iterator iterates over a split.

The interface should looks something like this:

let vec = vec![1, 2, 3, 10, 11, 12, 50, 120, 121, 122];
let splits = vec.iter().split_by(|&a, &b| a + 1 != b);

for split in splits {
    println!("split = {:?}", split.collect::<Vec<_>>());
}

with an output of:

split = [1, 2, 3]
split = [10, 11, 12]
split = [50]
split = [120, 121, 122]

Since I need to remember the previous element of the iteration (to pass the “a” to the closure) the items need to be Copy. Alternatively, a key-closure provides a Copy value for an item, like so:

let vec = vec!["abc", "run", "tag", "go", "be", "ring", "zip"];
let splits = vec.iter().split_by_key(|x| x.len(), |&a, &b| a != b);

for split in splits {
    println!("split = {:?}", split.collect::<Vec<_>>());
}

with an output of:

split = ["abc", "run", "tag"]
split = ["go", "be"]
split = ["ring"]
split = ["zip"]

The following shows my current work. I am stuck at how to manage the lifetimes required such that the inner iterator Split can advance the parent iterator managed by the outer iterator SplitBy:

struct SplitBy<I, K, C, F>
where
    I: Iterator,
    K: Fn(&I::Item) -> C,
    C: Copy,
    F: Fn(C, C) -> bool,
{
    iter: Peekable<I>,
    make_key: K,
    is_split: F,
}

impl<I, K, C, F> Iterator for SplitBy<I, K, C, F>
where
    I: Iterator,
    K: Fn(&I::Item) -> C,
    C: Copy,
    F: Fn(C, C) -> bool,
{
    type Item = Split<'a, I, K, C, F>;

    fn next(&mut self) -> Option<Self::Item> {
        if let Some(first) = self.iter.peek() {
            Some(Split {
                parent: self, // <- this is the part I am most unsure about
                prev: (self.make_key)(first),
                is_init: true,
            })
        } else {
            None
        }
    }
}

struct Split<'a, I, K, C, F>
where
    I: Iterator,
    K: Fn(&I::Item) -> C,
    C: Copy,
    F: Fn(C, C) -> bool,
{
    parent: &'a mut SplitBy<I, K, C, F>,
    prev: C,
    is_init: bool,
}

impl<'a, I, K, C, F> Iterator for Split<'a, I, K, C, F>
where
    I: Iterator,
    K: Fn(&I::Item) -> C,
    C: Copy,
    F: Fn(C, C) -> bool,
{
    type Item = I::Item;

    fn next(&mut self) -> Option<I::Item> {
        if self.is_init {
            return self.parent.iter.next();
        } else {
            if let Some(peek) = self.parent.iter.peek() {
                let peek_id = (self.parent.make_key)(peek);

                if (self.parent.is_split)(self.prev, peek_id) {
                    return None;
                } else {
                    return self.parent.iter.next();
                }
            }
        }

        None
    }
}

The code above is largely inspired by GroupBy and Group from the Itertools crate.

The code in Split::next is wrong, but I am able to fix that once Split can communicate with SplitBy. Does someone have a pointer what I am missing?

trentj · January 17, 2021, 4:05pm

I'm not sure exactly what the issue you're struggling with now is, but you seem to have overlooked an important aspect of the design of GroupBy:

This type implements IntoIterator (it is not an iterator itself), because the group iterators need to borrow from this value. It should be stored in a local variable or temporary and iterated.

(Actually this is not strictly correct -- GroupBy does not implement IntoIterator, instead &'a GroupBy does because that's how 'a gets into the signature of Group.)

Implementing Iterator for SplitBy is incompatible with the desired iterator behavior.

florianpircher · January 17, 2021, 4:21pm

I will try to replicate this technique for SplitBy. Thank you for your help!

mbrubeck · January 17, 2021, 4:23pm

I think you can make this work by using Rc<RefCell> to share the “parent” iterator, rather than references. Playground

(I also needed to fix some minor bugs in Split::next.)

However, note that this requires the caller to fully consume each child iterator before getting the next one from the parent iterator. If they do not, their code will compile but they will get incorrect results. itertools::GroupBy detects this situation and buffers items internally to prevent this.

Alternately, instead of implementing Iterator::next, you could have the parent’s next method keep it borrowed while the child iterator is alive. This would prevent out-of-order iteration at compile time, and you could go back to using references instead of Rc, but it would no longer implement the standard Iterator trait.

florianpircher · January 17, 2021, 4:41pm

Thank you very much for the playground! This looks great.

That is fine for my use cases. A general solution should probably incorporate the same trick GroupBy uses. I am still new to implementing Iterators and working with Rc, so I will see if that makes sense for my codebase.

mbrubeck · January 17, 2021, 4:44pm

For comparison, here's the implementation using only references, which doesn't implement the standard Iterator::next signature:

Playground

SkiFire13 · January 17, 2021, 6:03pm

For slices (and anything that derefs to a slice, like arrays and vectors) you could take inspiration from the unstable slice::group_by

florianpircher · January 17, 2021, 7:39pm

Good to know! While not applicable to my current setup, it will surely come in handy in the future. Also, I found the discussion on the RFC insightful in terms of naming and API-design.

system · April 17, 2021, 7:40pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Split mutable iterator over composite structure help	13	538	July 11, 2022
Is there an itertools fn for this?	16	664	January 18, 2023
How to "branch" out an Iterator help	8	879	October 19, 2021
Iterating over custom chunks help	3	303	November 13, 2023
[Solved] Iterator over mutable subslices help	3	1076	January 12, 2023

Split Iterator into Iterator of Iterators

Related topics