Flat_map and chaining iterators

Hi,

I've got a problem using chain() method with flat_map(). Here is a working Elixir prototype (you may skip it for now):

def traverse_flat_dst(src_dir, dst_root, dst_step \\ []) do
  {dirs, files} = list_dir_groom(src_dir) # Getting immediate offspring.

  traverse = fn d ->
    step = dst_step ++ [Path.basename(d)]
    traverse_flat_dst(d, dst_root, step)
  end

  handle = fn f ->
    dst_path =
      Path.join(
        dst_root,
        dst_step
      )

    {f, dst_path}
  end

  Stream.flat_map(dirs, traverse)
  |> Stream.concat(Stream.map(files, handle))
end

This is s stripped down Rust version:

fn traverse_flat_dst_iter(
    src_dir: &PathBuf,
) -> impl Iterator<Item = (PathBuf, PathBuf)> {
    let (dirs, files) = list_dir_groom(src_dir);

    let traverse = |d: &PathBuf| {
        traverse_flat_dst_iter(d)
    };
//// Three commented out lines below are good and working Rust, they just do nothing useful.
//    dirs.into_iter()
//        .map(|d| (d, PathBuf::new()))
//        .chain(files.into_iter().map(|f| (f, PathBuf::new())))
    dirs.into_iter()
        .flat_map(traverse)
        .chain(files.into_iter().map(|f| (f, PathBuf::new())))
}

The flat_map() thing won't compile:

rror[E0631]: type mismatch in closure arguments
   --> src/main.rs:275:10
    |
267 |     let traverse = |d: &PathBuf| {
    |                    ------------- found signature of `for<'r> fn(&'r std::path::PathBuf) -> _`
...
275 |         .flat_map(traverse)
    |          ^^^^^^^^ expected signature of `fn(std::path::PathBuf) -> _`

error[E0599]: no method named `chain` found for type `std::iter::FlatMap<std::vec::IntoIter<std::path::PathBuf>, _, [closure@src/main.rs:267:20: 269:6]>` in the current scope
   --> src/main.rs:276:10
    |
276 |         .chain(files.into_iter().map(|f| (f, PathBuf::new())))
    |          ^^^^^
    |
    = note: the method `chain` exists but the following trait bounds were not satisfied:
            `&mut std::iter::FlatMap<std::vec::IntoIter<std::path::PathBuf>, _, [closure@src/main.rs:267:20: 269:6]> : std::iter::Iterator`
            `std::iter::FlatMap<std::vec::IntoIter<std::path::PathBuf>, _, [closure@src/main.rs:267:20: 269:6]> : std::iter::Iterator`

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0599`.

I hope, Rust type annotations explain the idea: a recursive function, which is walking a file tree, returns an iterator serving (PathBuf, PathBuf) tuple. The iterator serves all the files of the tree. With the commented out stub it even kind of works.

What am I still missing?

dirs.into_iter() is providing Item = PathBuf, so that's what your flat_map function needs to take as an argument, but yours takes it by reference -- d: &PathBuf.

Even if you fix that though, I think the concrete type behind your impl Iterator will be a recursively nested type, which can't work. Something like:

Chain<
    FlatMap<IntoIter<_>, Fn(PathBuf) -> impl Iterator<...>>,
    Map<IntoIter, Fn(PathBuf) -> (...)>,
>

... where that inner impl Iterator is the same overall type. The FlatMap struct contains a member to the mapped iterator type, which is where you'll get improper type recursion. I think you'll need to Box that for indirection so the type can be resolved.

Thanks! Will you please suggest something like a snippet of code? For a newbie like me it's too concise, I'm afraid. Even the code you see cost me dearly :slight_smile: .

I changed it to this:

    let traverse = |d: PathBuf| {
        traverse_flat_dst_iter(&d)
    };

error[E0720]: opaque type expands to a recursive type
   --> src/main.rs:264:6
    |
264 | ) -> impl Iterator<Item = (PathBuf, PathBuf)> {
    |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expands to self-referential type
    |
    = note: expanded type is `std::iter::Chain<std::iter::FlatMap<std::vec::IntoIter<std::path::PathBuf>, impl std::iter::Iterator, [closure@src/main.rs:267:20: 269:6]>, std::iter::Map<std::vec::IntoIter<std::path::PathBuf>, [closure@src/main.rs:276:38: 276:61]>>`

Sounds like you are right, but I can't make much of it.

I can't test your incomplete example, but it might work to just box it:

    let traverse = |d: PathBuf| {
        Box::new(traverse_flat_dst_iter(&d))
    };

You might also have to coerce that to a dynamic trait object, not sure...

    let traverse = |d: PathBuf| -> Box<dyn Iterator<Item = (PathBuf, PathBuf)>> {
        Box::new(traverse_flat_dst_iter(&d))
    };

Thanks, your second suggestion works!

If this deserves a separate topic, I'll do it. I put the stripped parts back again, and got an error which I did not see while traverse was not in use:

fn traverse_flat_dst_iter(
    src_dir: &PathBuf,
    dst_step: Vec<PathBuf>,
) -> impl Iterator<Item = (PathBuf, PathBuf)> {
    let (dirs, files) = list_dir_groom(src_dir);

    let traverse = |d: PathBuf| -> Box<dyn Iterator<Item = (PathBuf, PathBuf)>> {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        Box::new(traverse_flat_dst_iter(&d, step))
    };
    dirs.into_iter()
        .flat_map(traverse)
        .chain(files.into_iter().map(|f| (f, PathBuf::new())))
}

The error:

error[E0597]: `dst_step` does not live long enough
   --> src/main.rs:269:24
    |
265 | ) -> impl Iterator<Item = (PathBuf, PathBuf)> {
    |      ---------------------------------------- opaque type requires that `dst_step` is borrowed for `'static`
...
268 |     let traverse = |d: PathBuf| -> Box<dyn Iterator<Item = (PathBuf, PathBuf)>> {
    |                    ------------------------------------------------------------ value captured here
269 |         let mut step = dst_step.clone();
    |                        ^^^^^^^^ borrowed value does not live long enough
...
276 | }
    | - `dst_step` dropped here while still borrowed

I don't understand whoever may need dst_step after cloning. Probably, cloning does not make what I think it should...

The clone is happening every time traverse is called, and you're returning the iterator that will be calling traverse lazily. Therefore dst_step has to continue existing as long as that iterator may use it. But you can let traverse take ownership of its captured values with the move keyword:

    let traverse = move |d: PathBuf| -> ...

It should still clone internally to pass a new step value to the recursive call.

Another solution. Thanks! It starts dawning on me, that Rust programming can be easy and even relaxing... :blush:

2 Likes

Yet another trouble. The Elixir prototype optionally offers reverse chaining:

    if v.o.flags.reverse do
      Stream.map(files, handle)
      |> Stream.concat(Stream.flat_map(dirs, traverse))
    else
      Stream.flat_map(dirs, traverse)
      |> Stream.concat(Stream.map(files, handle))
    end

Somehow, in Rust I can't manage it, total type mismatch whatever I try.

    if flag("r") {
        // This won't work.
        files
            .into_iter()
            .map(|f| (f, PathBuf::new()))
            .chain(dirs.into_iter().flat_map(traverse))
    } else {
        dirs.into_iter()
            .flat_map(traverse)
            .chain(files.into_iter().map(|f| (f, PathBuf::new())))
    }

Your impl Iterator has to resolve internally to a single type, but the two branches of your if-else have two different types, Chain<Map, FlatMap> and Chain<FlatMap, Map>.

Since we've already introduced a boxed dyn object in your solution, we can just move that up a level, and write traverse_flat_dst_iter(...) -> Box<dyn Iterator<Item = ...>>. Your if-else should now have a Box::new on each branch, and you won't need any boxing in traverse anymore.

Another option in general is to use enums for the different types. For example, Either implements Iterator if both of its parts do, so you'd wrap one side in Either::Left and the other in Either::Right. Or to create a bespoke enum type, possibly with more than two variants, the auto_enums crate can automatically derive this. Either way, you'd have a concrete type satisfying impl Iterator.

You mean, something like this, for starters:

    Box::new(dirs.into_iter()
        .flat_map(traverse)
        .chain(files.into_iter().map(handle)))

I can't satisfy the compiler.

Yes. Is it possible for you to share something on the playground? You can stub out unrelated details with unimplemented!() function bodies.

I'll try. It will take some time since I am not familiar with the playground.

This is GitHub. Single file project in a buildable state, not much more implemented than the trouble discussed :slight_smile: .

OK, looks like your GitHub didn't commit the (broken) reversal, but I made that change and "lifted" the boxing like I suggested before, like so:

fn traverse_flat_dst(
    src_dir: &PathBuf,
    dst_step: Vec<PathBuf>,
) -> Box<dyn Iterator<Item = (PathBuf, PathBuf)>> {
    let (dirs, files) = groom(src_dir);

    let traverse = move |d: PathBuf| {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        traverse_flat_dst(&d, step)
    };
    let handle = |f: PathBuf| {
        let dst_path: PathBuf = [&DST.as_os_str(), f.file_name().unwrap()].iter().collect();
        (f, dst_path)
    };
    if flag("r") {
        Box::new(files.into_iter()
            .map(handle)
            .chain(dirs.into_iter().flat_map(traverse)))
    } else {
        Box::new(dirs.into_iter()
            .flat_map(traverse)
            .chain(files.into_iter().map(handle)))
    }
}
diff --git a/src/main.rs b/src/main.rs
index a46f709877d2..b304b8841ea7 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -247,26 +247,26 @@ fn groom(dir: &Path) -> (Vec<PathBuf>, Vec<PathBuf>) {
 fn traverse_flat_dst(
     src_dir: &PathBuf,
     dst_step: Vec<PathBuf>,
-) -> impl Iterator<Item = (PathBuf, PathBuf)> {
+) -> Box<dyn Iterator<Item = (PathBuf, PathBuf)>> {
     let (dirs, files) = groom(src_dir);
 
-    let traverse = move |d: PathBuf| -> Box<dyn Iterator<Item = (PathBuf, PathBuf)>> {
+    let traverse = move |d: PathBuf| {
         let mut step = dst_step.clone();
         step.push(PathBuf::from(d.file_name().unwrap()));
-        Box::new(traverse_flat_dst(&d, step))
+        traverse_flat_dst(&d, step)
     };
     let handle = |f: PathBuf| {
         let dst_path: PathBuf = [&DST.as_os_str(), f.file_name().unwrap()].iter().collect();
         (f, dst_path)
     };
     if flag("r") {
-        dirs.into_iter()
-            .flat_map(traverse)
-            .chain(files.into_iter().map(handle))
+        Box::new(files.into_iter()
+            .map(handle)
+            .chain(dirs.into_iter().flat_map(traverse)))
     } else {
-        dirs.into_iter()
+        Box::new(dirs.into_iter()
             .flat_map(traverse)
-            .chain(files.into_iter().map(handle))
+            .chain(files.into_iter().map(handle)))
     }
 }
 

Than you again! Your help is absolutely essential. The Rust Book does not prepare you to this kind of quite legitimate endeavor. Now I can, at least, read the docs and see the light...

Why this seemingly trivial reconfiguration makes the compiler happy? The type annotations are the same, they just moved. The lambda is now free of them; is it the key?

The main difference was changing the outer impl Iterator to a Box<dyn Iterator> -- from an opaque concrete type to a boxed dynamic type (which can be different types within).

Actually, you could keep the opaque outer impl Iterator if you like, hiding the fact that it's boxed, but then it needs some other method of type coercion on the if-else branch. Like:

    let iter: Box<dyn Iterator<Item = _>> = if flag("r") {
        Box::new(files.into_iter()
            .map(handle)
            .chain(dirs.into_iter().flat_map(traverse)))
    } else {
        Box::new(dirs.into_iter()
            .flat_map(traverse)
            .chain(files.into_iter().map(handle)))
    };
    iter // returns as the `impl Iterator`

or explicitly cast each branch:

    if flag("r") {
        Box::new(files.into_iter()
            .map(handle)
            .chain(dirs.into_iter().flat_map(traverse)))
        as Box<dyn Iterator<Item = _>>
    } else {
        Box::new(dirs.into_iter()
            .flat_map(traverse)
            .chain(files.into_iter().map(handle)))
        as Box<dyn Iterator<Item = _>>
    }

Either way, what we're doing is creating a Box::new of each specific iterator type, then casting to a Box<dyn Iterator> where the specific types are erased.

I'm more than a bit dumb, I'm afraid. This one works (nothing new about it):

    let handle = |f: PathBuf| {
        let dst_path: PathBuf = [&DST.as_os_str(), f.file_name().unwrap()].iter().collect();
        (f, dst_path)
    };

An attempt at capturing experimentally dst_step fails no matter what.

UPD:

I've found the way! In the world of ownership capture means capture. If a lambda captures a variable, another lambda will have to be happy with its copy; no way to share. Is this way of reasoning correct?

    ...
    let d_step = dst_step.clone();

    let traverse = move |d: PathBuf| {
        let mut step = dst_step.clone();
        step.push(PathBuf::from(d.file_name().unwrap()));
        traverse_flat_dst(&d, step)
    };
    let handle = move |f: PathBuf| {
        let dst_path: PathBuf = d_step.iter().collect();
        (f, dst_path)
    };
    ...

Have I got it right, for a change?

It looks like you don't know how closures are desugared, you can read my blog post on that topic to see that closures are just fancy structs,

Thanks! I'm going to take a look, most certainly.