- 16 threads
- massive amount of very small tasks, most with very similar execution time
- one execution queue, prefilled with some fair (> 16) amount of tasks (maybe not all)
In this situation even in first iteration all threads will try to take first item from queue, but only one will succeed. The other 15 will fight for second task, then 14 will try to take third one, and so on. Even if fitst task is preasigned to thread, this will happen on second task (because all threads will finish its job in almost same moment). If tasks are very small its even possible, that when some threads are still trying to find something to work on (they are fighting for work on 13th item in queue), first two threads already done, and joined the fight! In practice instad of doing actual work, some threads are fighting for job to do. If tasks are small, and there are many threads, this may happen often.
Obviously it is also possible to create queue per thread, and don’t steal any job. It will work pretty well, but the problem is if one thread will get jobs which are little smaller. If jobs are only a little smaller the thread will still include one queue problem (remember, that taking item for non-blocking queue still takes time), but when cumulated, there is some time, when not all calculations are yet finished, but one thread is doing nothing, because it done its job. This is starvation problem.
Stealing scheduler is trade off between those two aproach.