To start with a minor point, looking at your current code, I do not think you need to collect scoped thread handles in a Vec if you do not intend to use them. By itself, crossbeam will ensure that the threads are joined before the host scope is exited, so you can simplify the chunks loop into...
// loop over next chunks
for chunk in next.chunks_mut(chunk_size) {
scope.spawn(move || {
iteration(¤t, chunk, size_x, global_index);
});
global_index += chunk_size;
}
...but again, this is only a nitpick with minor performance impact. Now, onto your real question: is it possible to avoid spawning and destroying threads on every iteration, while otherwise retaining the benefits of the scoped thread approach (no explicit unsafe code bypassing the borrow checker, and no synchronization of individual accesses)?
The key to answering this question in a satisfactory way is to realize that crossbeam's scoped thread API does two things, which are only related by an API design choice:
- It spawns threads on which work can be executed.
- It allows for scoped work execution on these threads.
To avoid spawning and destroying threads all the time in an iterative algorithm, while otherwise retaining the usability and performance benefits of scoped threads, we will need to separate these two features by distinguishing...
- One "outer" scope, possibly global, in which a pool of OS threads is spawned
- One "inner" scope in which work is sent to this outer thread pool, and subjected to the usual work-scoping guarantees (wait for the end of the work before exiting the inner scope).
In this scheme, the "inner" scope sends tasks to the "outer" scope via a communication channel, and awaits a message signalling panic or task completion at the end of the scope. As with any scoped thread API, ensuring memory safety in this situation requires carefully crafted unsafe code, which is why I would not recommend that you write this code yourself when very nice people have already written it for you.
I know of two libraries which allow such scoped execution inside of an outer thread pool: rayon, via the scope()
and join()
APIs, and scoped_threadpool, via the Pool::scoped()
API.
For learning purposes, you may want to start with scoped_threadpool
, as it is a minimal implementation of the separation of thread pool and scoped execution that I discussed earlier. For production use in a larger application, I would probably favor use of rayon
, as it is offers stronger API stability guarantees, has a broader feature set (e.g. can slice input work automatically if you let it do so), and has received more performance optimizations (e.g. a scalable work distribution system based on work-stealing lock-free queues).