Odd technique: Writing an iterator inside-out


#1

So here’s a bit of an odd technique that I have developed over time. Initially it was a sort of beginner’s allergic response to the lifetime checker (at the time, I still could never quite figure out how to get flat_map to check), but in times since I have managed to discover other reasons to use it.

Usually, when we want to return a bunch of cool things, we have to do one of these:

// this...
fn cool_things() -> Vec<Thing> { ... }

// ...or...
fn cool_things() -> Box<Iterator<Item=Thing>> { ... }

// ...or...
use std::iter;
use std::slice;
type CoolThings = iter::ALongTypeName<f64, iter::ThatYou<f64>,
                    iter::HadToCopyAndPaste<
                        slice::FromYourLast<TypeError>>>;
fn cool_things() -> CoolThings {
    &vec[..].from_your_last()
        .had_to_copy_and_paste()
        .a_long_type_name(0.0, iter::that_you)
}

// ...or...
struct CoolThings { state: Thing }
impl Iterator for CoolThings {
   fn next(&mut self) -> Self::Item {
       // ... hrrrrrngh ...
   }

   fn size_hint(&self) -> (size, Option<usize>) { (41, (Some(41)) }
}
fn cool_things() -> CoolThings { CoolThings { state: Thing::new() } }

Presenting: The inside-out iterator!

Instead of the above, you simply write a function that takes a callback:

fn each_cool_thing<F: FnMut(Thing)>(mut cb: F) { ... }

Then, if your objective was to e.g. build a list, you could write this in place of collect/extend:

let mut out = Vec::with_capacity(n);
each_cool_thing(|x| out.push(x));

Yeeeuch!, you say, how imperative!

And yes, I’ll admit, “iterators” written in this fashion are difficult/impossible to compose (an internet banana to anyone who can write a combinator that “zips” two functions like these, taking an FnMut(T,U) callback)… so why would one do it?

Complex item generating logic!!

Iterators have a single entry point each iteration, making it annoying to express some things that are trivial to do with callbacks. It’s like a co-routine!

fn each_fibonacci<F: FnMut(i64)>(x0: i64, x1: i64, n: usize, mut yield: F) {
    yield(x0); // I sincerely doubt there is any natural way to fit this
               // first yield into an `unfold` without resorting to e.g.
               // std::iter::once(x0).chain(...)
    yield(x1);
    for _ in 2..n {
        // do heavy computation work to compute next value
        yield(x);
    }
}

(note: an unfortunate thing you can see here is the inability to write .take(n), so a size must be supplied for iterators which would have been infinite)

It’s lazy!

The above example (and any “coroutine-like” example) could of course easily have been written to build and return a vector:

fn fibonaccis(x0: i64, x1: i64, n: usize) -> Vec<i64> {
    let mut out = vec![x0, x1];
    // ...
    out
}

but this would have produced the entire vector all in one go, giving you no place to hook in a stylish progress bar display. :wink:

It still succeeds at separating concerns!

Despite its limitations in comparison to iterators, it does still successfully separate how the data are produced and how the data are used!

It can be fast in a pinch!

I did a benchmark and I regret to inform that for x in src { x.each_thing(|e| out.push(e)); } outperformed out.extend(src.flat_map(things)) in a test where things produces iterables of length 1 to 2; even when the return type uses such stack-based wizardry as iter::Chain<iter::Once<T>, option::IntoIter<T>>:

test bench_flat_map_stack      ... bench:  11,493,682 ns/iter (+/- 116,480)
test bench_flat_map_vec        ... bench:  24,880,065 ns/iter (+/- 2,267,359)
test bench_imperative_iterator ... bench:   7,500,583 ns/iter (+/- 105,947)

What do others think?

Is this a pattern you’ve used? Comments and cluebats welcome.


#2

D did this; this was how iterators worked (see opApply).

They were incomposable in a lot of cases and a huge PITA, so the language was moving away from those to Rust-style iterators when I was last using it.

I’d only use them as an absolute last resort.


#3

Oh, and as an addendum: the “real” solution to this issue is to add generators to the language so that writing a regular Iterator state machine, and writing an inversion-of-control iterator are of equivalent difficulty: invoking the callback becomes yield and you’re more or less done… questions of flow control aside.


#4

Rust did this; :smile: before the introduction of external iteration in 2013!


#5

Heh, now there’s a history lesson I didn’t expect! To think closures existed before the iterator protocol… and that at least one other language has gone that route!

Hmm, come to think of it, I also recall Ruby having a kind of inverted flow of control as well (and this is another language with closures)… I guess this is not altogether an uncommon design! Perhaps closures are not as difficult to implement as I have assumed they are? Either that, or they are super difficult, to the point where a language has them either from the start, or never at all!


#6

I have more to say about internal vs external iteration in Rust, but I’m not sure I’m going to get that blog post done anytime soon. I can give you a few hints here:

Good, and the prize in crosshairs is worth even more than this example, see the VecDeque improvement or corresponding ndarray ones for .fold(). Segmented/nested iteration is seldom as optimizable when Iterator::next becomes a state machine; in the fold method we have the opportunity to write plain nested loops instead. – bluss, stack overflow

See also PR 37315