Rayon and join for multiple results


I'm using rayon and find myself often wanting to "divide" the work into more than two pices. I know that there is Scope and spawn() for that. It has however two downsides:

  • It's cumbersome to return values since a outer variable have to be defined before and updated making to longer code.
  • It's documented as slower than join

I have found myself using a macro such as Rust Playground allowing me to (to my understanding) split the work efficiently and with short code into multiple parts:

fn my_func() -> (i32, i64, f32, f64) {
    join!(|| calc1(), || calc2(), || calc3(), calc4())

Is there any downside I'm not seeing to using such a macro/structure? I would almost assume so since I'm surprised there is nothing similar available in core rayon?

(And as a bonus - is it possible to write that macro recursively to allow any number of arguments - I have tried in vain but resorted to spelling out each of the cases from 2 to 8 arguments)


Why don't you just write an array of your arguments and call into_par_iter ?

Your example would be:

1 Like

Firstly because I honestly did not consider the possibility - but when now thinking about it I'm struggling to see how to nicely/shortly/efficiently do that. Do you mean something like: Rust Playground ?

Im struggling to get that to work without adding type annotations (which can be lengthy for many closures)

(to clarify - the sum is just a silly example - the types and calcs involved are way more complex - hence that the join! macro solution becomes nicer)

Furthermore - and most critically when I think about it - the array solution requires all the closures to return the same type - they typically might not...

You've got some computations you want to run in parallel. They take bunch of arguments and returns different things, maybe they are all different. If they are, write a function taking an enum of those different arguments and dispatches them to the intended function and return another result enum. Meanwhile your "main" function only consists of:

    .map(|args| dispatcher(args))

(Maybe you don't want to collect into anything, that's only an example)

Thanks - agreed that it would work / is a different way to do it.

However - this way would force a allocation of a Vec (and most likely for each of the enum values as well), would force me to write a type and dispatcher for each position in the code where I want to run multiple different things in parallel. I would also argue that it's less clear in the code.

Again, maybe I am missing something - but what do you mean is the advantage of this approach? Or why am the join! macro a bad idea?

(I do apprechiate the input though - I hope I do not come across as unwilling to trying something else - just wnat to find the best and clearest way)


1 Like

Really - my point with the join! macro is that it's really easy to "spray" all over the code whenever somthing can be split up without adding any complecity or more difficult to read code.

I.e. just the way the join function from rayon allows - but allowing me to split in more thatn 2 pieces without having to each time hacing to type out a recursive / multilayer closure code.

In general when you have a bunch of parallel computations to do they should be pretty "homomorphic" in respect to their types. That's why I prefer to lean on the type system rather than macros.

In your particular case, if you really only have like 6-10 things, chained joins may be your best solution. Above that number I would reframe the initial problem to get a nice typed source iterator.

1 Like

Unsurprisingly, there’s others that have already been wondering the same.

Yes, that should definitely be possible. By the way, I am not sure what kind of nesting for multiple join calls is the best / most efficient. Might make sense to get familiar with the actual implementation if the thing to judge that best; and possibly compare to the design decisions for parallel iterators, I guess…

A join would, as far as I can tell, also add a layer of dynanicness (in its interaction with the thread pool, work queue and such), so adding another layer and producing &mut dyn FnMut() callbacks could be feasible. I’d be curious whether such an (yet to be implemented, I’m trying it at the moment…) approach, using parallel iterators, would perform better or worse than the “use lots of rayon::join” approach.


Here we go: Rust Playground

Thank you (I think :slight_smile: ) - but I'll need a while to even start to understand what is goin on here

The most difficult part w.r.t. the macro stuff is understanding the dance of creating a bunch of identifiers recursively, distinguished only by hygiene. The implementation then merely packs up all the closures into nice dyn FnMut() + Sends, which involves some Options to still allow FnOnce on the one hand, and pass back the return value on the other hand; the reason why FnMut is used in the first place is to avoid Boxing.

As always with understanding macros, feel free to add trace_macros!(true); to the beginning of the file, and follow a trace of macro evaluation, though note that the output printed there will have all those identifiers distinguished by hygiene looking indistinguishable. (Lots of “f” and “r”.)

1 Like

When you are just wanting to apply the same operation to all the items in a collection or iterator - kinda like the normal Iterator methods, just in parallel - into_par_iter() is your friend.

For something where the operations all do different things and return different types, I'd reach for the "thread pool"-like approach you get with Scope and spawn().

If that feels cumbersome, then I'd normally interpret that as my code saying there's some sort of "impedance mismatch" that I could resolve by taking a different approach. I normally shy away from using macros to resolve awkward code because they tend to just sweep the problem under the rug.

To bastardize some well-known Go Proverbs,

Clear is better than clever.

Reflection Macros are never clear.