What are the benefits of a work stealing scheduler?

najamelan · February 5, 2019, 2:13pm

From coming across WSS alot, I understand that this is a popular sheduler choice. For example in tokio. I just can't find out why this is used rather than a model where there is just one central queue, from which every thread gets a task when they are ready to do some work.

It seems to me that letting every thread have their own queue introduces the need for load balancing and work stealing which seems alot of hassle if you could just take one task at a time from a central queue.

What am I missing?

hashedone · February 5, 2019, 2:31pm

Assume situation:

16 threads
massive amount of very small tasks, most with very similar execution time
one execution queue, prefilled with some fair (> 16) amount of tasks (maybe not all)

In this situation even in first iteration all threads will try to take first item from queue, but only one will succeed. The other 15 will fight for second task, then 14 will try to take third one, and so on. Even if fitst task is preasigned to thread, this will happen on second task (because all threads will finish its job in almost same moment). If tasks are very small its even possible, that when some threads are still trying to find something to work on (they are fighting for work on 13th item in queue), first two threads already done, and joined the fight! In practice instad of doing actual work, some threads are fighting for job to do. If tasks are small, and there are many threads, this may happen often.

Obviously it is also possible to create queue per thread, and don't steal any job. It will work pretty well, but the problem is if one thread will get jobs which are little smaller. If jobs are only a little smaller the thread will still include one queue problem (remember, that taking item for non-blocking queue still takes time), but when cumulated, there is some time, when not all calculations are yet finished, but one thread is doing nothing, because it done its job. This is starvation problem.

Stealing scheduler is trade off between those two aproach.

jessegrosjean · February 5, 2019, 2:32pm

I'm no expert here, but I'm pretty sure the problem is:

Contention when accessing that central queue becomes a bottleneck
To avoid that each thread maintains its own local work queue which it uses most of the time
But if that local queue runs out of work then it needs to steal work from other queues

najamelan · February 5, 2019, 2:44pm

Ok, so yes contention needs to be dealt with, but I would imagine something like:

Threadpool has a sink for each thread in the pool.
A channel for each thread to send tasks
each thread has a stream for incoming tasks

When starting threadpool takes the tasks and sends one to each thread. Now each sink is NotReady. When thread sends back the result of the computation, the sink gets woken up and takes the next task from queue.

In this design, yes there might be some waiting time for the threads while threadpool is dispatching tasks, but at least the access of the queue is in a single thread, so it's lock free.

Would this be worse performance wise then organizing the work stealing with thread syncronisation? It seems a much simpler design than work stealing...

I suppose the limitation here is that each task is dispatched one by one?

schets · February 5, 2019, 11:55pm

Work stealing has some significant cache benefits:

It keeps all the children task of some subtask on the same core, with the assumption that if task A spawns task B, then they are more likely to share memory (or at least be close) than task A and some distant task Z
They generally operate on a stack model, which usually is much friendlier to the cache and child tasks than queues. Imagine recursively splitting an array with work stealing, where there is a lot of locality benefit to using a stack model with spawned tasks and best-effort keeping tasks on the same core.

najamelan · February 6, 2019, 12:02am

@schets Thanks, that makes sense...

dcarosone · February 6, 2019, 12:13am

.. and all this applies "same, but more so" for NUMA and cluster cases where you may have a whole fleet of worker machines.

hashedone · February 6, 2019, 7:06am

This is possible for small amount of workers and unbalanced tasks size, otherwise you have all disadvantages of having single centralized queue (most of workers are doing nothing but waiting for tasks - no matter if they are actively fighting for it, or waiting for some all-time-busy-dispatching task).

Hyeonu · February 6, 2019, 9:59am

Work-stealing queue scheduler is developed and optimized for traditional fork-join concurrency model, where each task can spawn zero or more subtasks, which can possibly be run in parallel. Its main benefit is that WSS pays synchronization cost only when needed, while central-queue needs to pay it for every tasks.

Topic		Replies	Views
Best way to share crossbeam stealers between threads? help	5	1727	October 28, 2020
I create a too simple task scheduler,Can you find some basic questions are? code review	1	421	March 14, 2023
Exception safety in work-stealing thread pool	7	3659	January 12, 2023
Single threaded multitasking vs Multihreaded multitasking (Tokio) help	3	492	March 10, 2023
Seems there is no native WORK QUEUE? help	3	655	June 12, 2020

What are the benefits of a work stealing scheduler?

Related Topics