Unboxing Iterator

Hi,

I've got this snippet of working code:

use std::path::PathBuf;
type WalkItem = (PathBuf, Vec<PathBuf>);
type WalkIterator = Box<dyn Iterator<Item = WalkItem>>;

fn dir_walk(dir: &PathBuf, step_down: Vec<PathBuf>) -> WalkIterator {
    let (dirs, files) = dir_offspring(dir);
    let step = step_down.clone();

    let walk = move |d: PathBuf| {
        let mut step = step_down.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        dir_walk(&d, step)
    };

    let item = move |f: PathBuf| (f, step.clone());

    Box::new(
        dirs.into_iter()
            .flat_map(walk)
            .chain(files.into_iter().map(item)),
    )
}

One has to box the iterator because it is not Sized; there is no other reason, or so I gather. I've heard about the impl Trait thing; it's good for the return type, they say. Experiments with

...-> impl Iterator<Item = WalkItem>

would show compiler errors only. Is the current implementation of dir_walk the only possible solution?

You're going to need some sort of indirection, since you're defining a recursive type. You could maybe make some concrete iterator that involves the indirection yourself, but IMO, you might as well use what you've already got.

In other words, the recursive definition is another reason for boxing in this case, beyond the un-Sized nature of dyn Iterator.

1 Like

Suppose I'm too curious :slight_smile: : who's doing the linking, and how? Of course, most of the linking happens inside

        dirs.into_iter()
            .flat_map(walk)
            .chain(files.into_iter().map(item))

but I have nothing to do with it. I can't yet see how my boxed iterator serves the purpose. Why it is not concrete and where is its linking point.

walk is a closure that returns the result of dir_walk so its return type is the same type that dir_walk returns. Hence, that same return value appears in dirs.into_iter().flat_map(walk).

1 Like

By concrete I just mean, try to create some structs to implement these iterators yourself, without type erasure (dyn Iterator).

To try to highlight the link, here's a rough approximation of the unboxed WalkIterator iterator type:

Chain<
  FlatMap<
    IntoIter<PathBuf>,
    WalkIterator,
    Closure#1
  >, 
  Map<
     IntoIter<PathBuf>,
     Closure#2
  >
>

Remembering that iterators are lazy, not eager, what happens when you call next()? [1] First, Chain calls the FlatMap. FlatMap calls Closure#1 on the IntoIter, which calls dir_walk and gives back a WalkIterator -- let's call this WalkIterator#0. The FlatMap calls WalkIterator#0 and, assuming the result is not None, it returns the result to you. In the meanwhile, it holds on to WalkIterator#0 because it's going to need to call it again when you call next() a second time.

So at any given point, you're holding onto WalkIterator#0. But what does WalkIterator#0 consist of? The same thing as the type above -- it contains an iterator with a bunch of nested types, one of which is WalkIterator itself. WalkIterator#0 contains a WalkIterator#1 which contains a .... etc. This is how your recursive iterator is the same situation as the Cons type in that chapter of the book I linked to.


You could probably manage your own stack in a Vec instead of recursing, with the Vec would acting like a "fancy Box" to supply your indirection. Or you could keep the recursion and do your own boxing. But again, I really don't think it's worth it; you'll still have most of the indirection. Additionally, you're doing so much cloning of data here for each Item that even if recursive types wasn't a concern, I doubt you'd notice any advantage from removing the indirection at the top.

I guess it could be interesting to implement from a learning standpoint.


  1. Disclaimer: I'm not going to fact check the order of operations here, so some details not pertinent to this discussion may be a little off. ↩ī¸Ž

1 Like