Idiomatic Rust Concurrency

Let's say I'm going to implement a map-reduce type pattern in Rust: I'm going to parallelize some computation, then merge/total the results from each task/thread (these may or may not be the same thing).

I could do this one of two ways, broadly:

  • The more typical M-R style of having each thread (which is actually typically a process in actual map-reduce, thus no shared memory) produce its own results in the reduce, then merge them all at the end

  • Use some concurrent data structure - say, a HashMap behind an Arc & Mutex - to make in-place updates

I've done a fair amount of reading on Rust concurrency practices and it seems the second approach is probably the more favored/common/idiomatic in Rust.

Anyone disagree with that?

Also, are there ways to merge collections efficiently that the first solution could generally be as performant as the second?

(Immutability makes things cleaner in general. But if it means a big hit in performance in Rust - don't do it, I'd say.)

1 Like

Found some related thoughts here:

It seems building & merging isn't generally frowned upon (and can be more performant due to lack of contention).

This is certainly quite a broad discussion you’re opening here, touching on map-reduce-like patterns in general and in the abstract, mentioning multi-process (even multi-machine?) parallelism without shared memory, but also shared-memory, shared-mutability with Mutex…

…well I suppose you aren’t really trying to discuss the non-shared-memory scenario that deeply…

anyways. First up, your focus here seems to be more on parallelism than on concurrency. Though some people don’t like to make the distinction. Either way, I’d call map-reduce a form of ”parallelism”, as the goal is to speed thing up with multiple cores. (Rust does even support “concurrency” that isn’t parallel via single-thread executors of async Rust, but that’s indeed a very different topic.)

As for idiomatic parallelism in Rust, and a good and important thing to study, there’s the rayon library, that offers a variety of parallelism abstractions, including map-reduce style ones, and should be highly idiomatic, as well as a good demonstration what neat and safe parallelism APIs Rust as a language can handle.

8 Likes

I don't think Rust will force you to go in one direction in particular, you can equally approach the problem by spawning a bunch of threads and joining them back once they have finished the computation, or you can use a shared data structure with synchronization primitives.

That being said, I think you are trying too hard to find problems where there are none. Yes, map-reduce is a good use case for monoids and immutability because the problem itself is highly parallelizable, but no, it's not the "I knew it, Rust sucks and Scala rules" thing that you might be looking for.

I'm entirely aware that Rust doesn't force any direction here - thus the reason the question is phrased as what (if anything) is idiomatic.

Your second paragraph is entirely uncalled for, and will be ignored (save this comment).

Let’s look at a few of the things rayon brings to the table.

The most simple API it comes with it its join method. You call join(|| one_operation(…), || other_operation(…)) and then the two things can run in parallel without much overhead. It’s more flexible than simple thread::spawn API in that either operation can freely refer to local variables in scope – thus offering a way of sharing data between threads without the need of any Arc – though the std::thread API has caught up to this possibility of sharing local variables, via the thread::scope API.

But the join method of rayon also works in a much more lightweight setting, foregoing the requirement to spawn a “physical” (managed by the operating system) thread, and instead running in a single eternal thread-pool – also it’s only signalling potential parallelism, so that if the thread-pool is already sufficiently busy, it will just end up running both sides of the join on the same thread.

And then there’s Rayon’s “parallel iterators” API, which is a relatively huge thing, and has some complex internals, but it’s really just a lot of (very convenient!) convenience APIs to make a useful choice of calls to the join method under the hood. Starting to use it is often as simple as importing the right trait, and changing an iter or into_iter into the corresponding par_iter or into_par_iter call, and suddenly your map or sum, filter, for_each, reduce (well… reduce doesn’t quite have the same signature) or fold (well… fold doesn’t really have the same signature at all, actually), etc… operations will be done in parallel.

7 Likes

Of course, another possibility here, kind of orthogonal, is to use channels: Have one thread own the master data structure and the workers send results to it.

That's really no better than merging on completion for a map-reduce type problem, though.

I have coded this little experiment both ways, and, for the data set I'm using, anyway, merging is winning. This makes sense, since a HashMap behind a Mutex can only be updated by one thread at a time, resulting in lockstep concurrency and little benefit, because the heart of the algo (word count) is relatively cheap.

Of course these results will vary greatly depending on the particulars...

I'm using rayon in one of the instances of my wordcount experiment. But I had to hide the HashMap behind an Arc & Mutex to be able to move it into the closure (for mapping a parallel iterator) which resulted, I think, in the poor performance I'm seeing, compared to merging.

But this is because, in part, the work algo isn't complex enough - updating the map is a large enough share of the work that concurrency contention for it brings down the overall throughput. This would not be the case in a more realistic scenario.

Still, I do not like this issue of lockstep concurrency. I like building individual maps in each worker and merging them better, personally, subjectively. HashMap.extend came in handy for this.

Indeed, if you use Mutex with rayon (beyond just occasional use), it’s going to lead to suboptimal performance, because the model of rayon is that it’s for parallelising computation tasks, not concurrently running potentially-blocking operations; so in case threads of the thread-pool get blocked for significant times, it quickly no longer utilizes every core, even if there’s other parallel work left, waiting to be executed.

It’s always a question of nuance, of course. If the main thing something does is compute something, and then just quickly it needs to access something behind a Mutex every so often, it may not be too bad, but if the main operation is access to the data structure behind a mutex, then it can quickly kill the parallelism again.

For more in-depth discussion, it would probably be useful to address how exactly you used a hashmap together with rayon’s parallelism. It sounds like it was part of producing some result … for that, glancing e.g. at the approach rayon offers by-default for .collect&.extend operations, one can see it does come with special logic for Vec, but most other types, like for a HashMap, it goes back to a simple approach of collecting items first, separately, and merging them in the end; more concretely, it doesn’t even merge hash maps but merges up several LinkedLists[1] and traverses the whole thing in the end to create the HashMap in one go.


  1. one of the rare practical use-cases of linked lists… when you have some unpredictably nested/bracketed merging pattern; note that rayon uses linked lists of Vecs here, so every individual “chunk” of data that isn’t further splittable parallel workload would avoid the overhead of liked-list items ↩︎

4 Likes

This is still cool because I hadn't noticed the lib had anything at all built-in for collection merging.

The more I read, the more I glean that non-shared-state concurrency is idiomatic (or, at least, preferred), in Rust, which makes me happy.

Rust encourages us to share memory by communicating (instead of communicating by sharing memory ).

1 Like

Yes, you can relax if you were worried that the non-shared state approaches are not idiomatic. But I believe both approaches (shared state and non-shared state) are idiomatic in Rust -- they are both very common, very well supported by the language and ecosystem, and very natural.

1 Like

There's also plenty of libraries like dashmap if you did want shared mutable state.

Check out the entire concurrency category on lib.rs!

The big thing about rayon is that it implements work-stealing. Maybe not that big of a deal if you know your work units always take a roughly equal time to process, but if not, work-stealing will really help keep all your cores busy.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.