How to use rayon's par_iter() over a Vec<PathBuf>

I have used walkdir to gather a Vec and want to process each value using rayon's par_iter(). When I just iterate, the code compiles & runs fine, but when I use .par_iter() it won't compile.

use rayon::prelude::*;
use std::path::*;
fn main() {
    let names = vec![
        PathBuf::from("one.txt"),
        PathBuf::from("two/three.txt"),
        PathBuf::from("two/four.txt"),
    ];
    for name in names.par_iter() { // for name in names { // works fine
        println!("{name:?}");
    }
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0277]: `rayon::slice::Iter<'_, std::path::PathBuf>` is not an iterator
 --> src/main.rs:9:17
  |
9 |     for name in names.par_iter() {
  |                 ^^^^^^^^^^^^^^^^ `rayon::slice::Iter<'_, std::path::PathBuf>` is not an iterator
  |
  = help: the trait `Iterator` is not implemented for `rayon::slice::Iter<'_, std::path::PathBuf>`
  = note: required because of the requirements on the impl of `IntoIterator` for `rayon::slice::Iter<'_, std::path::PathBuf>`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `playground` due to previous error

How can I fix this?

For loops are language built-ins and they know nothing about parallelism. Thus you can't use them for iterating in parallel. Use the for_each() method on the returned iterator instead.

1 Like

I'm not sure how to do that.
To clarify, I'm trying to transform a list of filenames into a list of data items:

use rayon::prelude::*;
use std::path::*;
fn main() {
    let names = vec![
        PathBuf::from("one.txt"),
        PathBuf::from("two/three.txt"),
        PathBuf::from("two/four.txt"),
    ];
    let mut results = vec![];
    for name in names { // .par_iter() doesn't work
        results.push(name.to_string_lossy().to_string());
    }
    println!("{results:?}");
}

In reality I'm not just converting to names but doing some other stuff, but the point is that inside the loop I'm updating a vec.
If I use for_each() then I get lots of parallel accesses to the results vec which naturally I don't want.

I've used .par_iter() with normal for loops in other code without problems but can't see why this is different.

Like this:

    let results = Mutex::new(Vec::new());
    
    names.par_iter().for_each(|name| {
        results.lock().unwrap().push(name.to_string_lossy().to_string());
    });

Well how else do you want it to work, then? The point of a parallel iterator is that it iterates in parallel. If you don't want to iterate in parallel, then don't use a parallel iterator.

Please show an example of such code. The Rayon parallel iterator types don't generally implement IntoIterator or Iterator, so you are probably mistaken about that.

2 Likes

That defeats the purpose of using par_iter(). By using for x in y.par_iter() { process(x) } I don't need locking since par_iter() does its work concurrently but delivers its results serially.

If your operation is simply mapping one thing to another, just do list.par_iter().map(...).collect(). And, again, you can't use a for-loop with parallel iterators. I don't know where you got that idea from, but you can't.

NB: if your operation is as simple as the one you showed, parallel iteration will surely be less efficient and slower than sequential iteration. If it is more expansive, then locking a mutex would be a small cost.

4 Likes

But you can't do that because, again, parellel iterators don't implement the regular Iterator interface, for which the reason is that they can't. You can't (usefully) write a lazy, sequential iterator that somehow magically computes only the next element while also doing work for other elements in parallel; you could cache the elements and results implicitly (cf. .par_iter().collect()), but that would completely defeat the purpose of the sequential, lazy iterator interface, and it would be misleading performance-wise.

You might be mistaken as to what "delivering results" means. Methods on ParallelIterator definitely do cause code to be run in parallel; in particular, the documentation of for_each() (which is the parallel analogy of a serial for loop) explicitly says that it

Therefore, the closure may (and generally will) be called from several threads at the same time. Thus, you absolutely do need locking if you want to mutate state that is not local to the worker closure.

If you are thinking about simply mapping some computation over a parallel iterator and then collecting the results into a collection, then that's possible via the example @erelde showed you. But then the mapping itself is still parallel, and it's Rayon's own mechanism itself that makes sure that either the retrieval of the data is synchronized, or that the results don't cause a race condition by updating shared state – but that is something that is written by hand, inside the appropriate methods, and it's absolutely not something you can do in the worker itself.

OK, I finally think I understand! Thank you.

This works for me (it is at the end of a function):

    let filenames = get_filenames(config);
    filenames
        .par_iter()
        .filter_map(|filename| process_one(&filename, config).ok())
        .collect()

Just to let you know: with Python and using a process pool the runtime is 7-8 secs; with Rust and plain .iter() it is 0.8 secs, and with .par_iter() 0.6 secs.