Understanding ? in Map/Filter Closures

I am writing a small bit of code to simply list out all the files in a directory, and as far as I can tell, this seems like the correct way to do that:

fn get_files(p: &Path) -> Result<Vec<PathBuf>, io::Error> {
    read_dir(p)?
        .into_iter()
        .map(|f| f.map(|f| f.path()))
        .filter(|f| f.as_ref().is_ok_and(|f| f.is_file()))
        .collect()
}

However, my naive rust-beginner brain would have preferred to write:

fn get_files(p: &Path) -> Result<Vec<PathBuf>, io::Error> {
    read_dir(p)?
        .into_iter()
        .map(|f| f?.path())
        .filter(|f| f?.is_file())
        .collect()
}

If I understand the compiler messages, this is not allowed because then the closures are then returning PathBuff and bool rather than Results.

Given that, there a better/more ergonomic way to approach this that I'm missing (especially since I'm collecting the Results anyway), or is this the preferred approach?

Thank you for any help!

I think this is the case where a simple for loop make more sense than doing type gymnastics with iterators.

fn get_files(p: &Path) -> Result<Vec<PathBuf>, io::Error> {
    let mut paths = vec![];
    for entry in read_dir(p)? {
        let path = entry?.path();
        if path.is_file() {
            paths.push(path);
        }
    }
    Ok(paths)
}
6 Likes

I tried another approach using try_fold. I think @nerditation's solution is better because for loops are easier to understand than folds. But they're very similar.

fn get_files(p: &Path) -> Result<Vec<PathBuf>, io::Error> {
    read_dir(p)?.try_fold(vec![], |mut paths, entry| {
        let path = entry?.path();
        if path.is_file() {
            paths.push(path);
        }
        Ok(paths)
    })
}
2 Likes

Note that the filter call above will cause errors to be silently discarded. I assume that's not what you wanted. To propagate the error upward you'd need something like this:

   .filter(|f| f.is_err() || f.as_ref().is_ok_and(|f| f.is_file()))

that's no coincident.

sometimes fold is called a "universal" operator in functional programming. [1].

the universality comes from the fact that fold is an abstraction of the most fundamental functional construct: recursion. note, lfold is naturally tail recursive, while rfold is not, except for some special cases.

back to the problem in OP, let's workout the types backwards.

the return type is Result<Vec<_>, _>, there's typically two way to propogate the inner error using only iterator combinators: a) use the special collect() implementation for Result type, or b) use one of the try_xxx() combinators.

  • use the special collect for Result<Container<T>, E>

    • this is how the original working code in OP is done;
    • we need an iterator which yields item of type Result<T, E>,
      • Iterator<Item = Result<PathBuf, io::Error>> in this example
    • read_dir() returns ReadDir: Iterator<Item = Result<DirEntry, io::Error>>
    • so we need map Result<DirEntry, _> to Result<PathBuf, _>
      • this is the inner map in the orignal code
    • we also want to filter only the paths that is a file
      • as @jumpnbrownweasel pointed out, the original code silently discarded any error during the filtering
      • to propogate the error to the final collect(), the filter callback should pass through the errors instead
  • use the try_xxx() combinators. the candidates are:

    • try_collect(), applicable, but:
      • it is unstable
      • it is no different from the first approach using regular collect() for Result<Vec<T>, E>
    • try_reduce(): not applicable:
      • the problem doesn't have a monoidal structure
      • plus, it's unstable
    • try_find(): not applicable, plus unstable;
    • try_fold(): can be used,
    • try_for_each():
      • not really a "combinator",
      • no different from an imperative loop, by definition.

so, my conclusion is, the structure of this problem inherently does not suit a functional style solution very well. the original code is already as close as you can get (minus the error propogation bug in filtering logic), all the alternatives is just "emulating" an imperative loop.


except, I that's not entirely true.

the bonus solution

the above conclusion only holds for the iterator combinators/adapters in the standard library. as long you are ok with custom iterator adapters, the code can indeed be simplified a bit.

this example uses two hypothetical map_ok() and filter_ok() adapters, which should be very obvious how to implement:

read_dir(p)?
    .into_iter()
    .map_ok(|f| f.path())
    .filter_ok(|f| f.is_file())
    .collect()

btw, map_ok() and filter_ok() are available from the itertools crate if you don't want to implement them yourself. itertools also has a filter_map_ok(), which can also be used for this problem (is it more readable than map_ok() + filter_ok()? I don't know):

read_dir(p)?
    .into_iter()
    .filter_map_ok(|f| {
        let path = f.path();
        path.is_file().then_some(path)
    })
    .collect()

  1. I think it is called catamorphism by math people, although I don't know what that word means â†Šī¸Ž

5 Likes

Thank you so much everyone - and special thank you to @nerditation - I really appreciate the detailed answer.

I do have some questions about how I'd go about implementing my own map_ok (just for learning purposes), but I suspect that should be a separate post.

Also, just as an extra reference, I found a nice blog post on fallible iteration in rust here, that might be helpful for future people looking at this thread.

Here's another version using filter_map:

fn get_files(p: &Path) -> Result<Vec<PathBuf>, io::Error> {
    read_dir(p)?
    .into_iter()
    .filter_map(|entry| entry.map(|entry| {
      let path = entry.path();
      path.is_file().then_some(path)
    }).transpose())
    .collect()
}
2 Likes