Propagation of Iterator Kind


#21

I think I see now; you are trying to make the optimizations of InfiniteIterator transparent, so that the benefits are received even when using the standard iterator API. To be honest, I hadn’t even thought that far ahead!

What I was talking about in the sentence that leonardo quoted was the problem that having InfiniteIterator will end up requiring at least two extremely-similar looking impls for many iterators. For instance, here’s an iterator that takes two sorted iterators of Item=<(usize, _)> and produces pairs from the matching indices:

impl<A,B,T,U> Iterator for SparseIntersection<A,B>
 where A: Iterator<Item=(usize,T)>,
       B: Iterator<Item=(usize,U)>,
{
    type Item = (usize,(T,U));
    fn next(&mut self) -> Option<Self::Item> {
        let &mut SparseIntersection { ref mut a, ref mut b } = self;
        if let (Some((i0,a0)),Some((k0,b0))) = (a.next(),b.next()) {
            let (mut icur,mut acur) = (i0,a0);
            let (mut kcur,mut bcur) = (k0,b0);
            loop {
                // scan to an index present in both vectors
                while icur < kcur {
                    if let Some((inext,anext)) = a.next() { acur = anext; icur = inext; }
                    else { return None } // stop immediately if either iterator ends
                }
                while kcur < icur {
                    if let Some((knext,bnext)) = b.next() { bcur = bnext; kcur = knext; }
                    else { return None }
                }
                // at this point, kcur >= icur
                if icur == kcur {
                    return Some((icur, (acur,bcur)));
                }
            }
        // at least one iterator had no elements remaining
        } else {
            None
        }
    }

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) {
        let (_, aupper) = self.a.size_hint();
        let (_, bupper) = self.b.size_hint();
        (0, ::std::cmp::min(aupper,bupper)) // any number of items may not match, so no lower bound
    }
}

This is infinite when both inputs are infinite. Let’s implement InfiniteIterator for it:

impl<A,B,T,U> InfiniteIterator for SparseIntersection<A,B>
 where A: InfiniteIterator<Item=(usize,T)>,
       B: InfiniteIterator<Item=(usize,U)>,
{
    type Item = (usize,(T,U));
    fn nooxt(&mut self) -> Self::Item {
        let &mut SparseIntersection { ref mut a, ref mut b } = self;
        let ((i0,a0),(k0,b0)) = (a.nooxt(),b.nooxt());
        let (mut icur,mut acur) = (i0,a0);
        let (mut kcur,mut bcur) = (k0,b0);
        loop {
            // scan to an index present in both vectors
            while icur < kcur {
                let (inext,anext) = a.nooxt();
                acur = anext; icur = inext;
            }
            while kcur < icur {
                let (knext,bnext) = b.nooxt();
                bcur = bnext; kcur = knext;
            }
            // at this point, kcur >= icur
            if icur == kcur {
                return (icur, (acur,bcur));
            }
        }
    }
}

Writing the method was easy; I just copied, pasted, and collapsed the if lets. The true burden lies in maintaining it:

  • We have a strikingly similar method to the first one…
  • …which differs in ways that are not simple to factor out without obscuring the logic…
  • …which clearly requires a wide gamut of unit tests if it is not factored…
  • …and it will be nontrivial to derive said unit tests from those that already exist for next.

These are many of the same reasons which make next_back impls rightfully scary, although the differences between next and next_back are often at least easier to factor out (and the payoff is far greater!)

SparseIntersection has a sister function SparseUnion, which is infinite when either input iterator is infinite. Even assuming that coherence weren’t an issue, the algorithm would need to be implemented four times (one Iterator plus three impl InfiniteIterators) to have unambiguous implementations under a lattice rule. A scenario such as that calls for some… creative refactorizations. (which I can very well imagine might involve macros and Always<T>)