We have 50_000 async tasks and a machine with 50+ threads.
Each async_task is of the form:
read all available udp packets, up to 20 packets
do some processing
send some udp packets
sleep for 200 milliseconds
Informally, we define 'fairness' as: every 'epoch', every one of the 50_000 async task runs once. No task is run more times than other tasks; no task is starved.
we want to define a 'simulation tick' as:
for each of the 50_000 async tasks, run them until they hit their 'yield point'
do the above as efficiently as possible, using the 50 threads
This isn't the same as being absolutely fair.
Being fair, i'd reckon, can be done by selecting a task via round robin and assigning it to the first available thread.
This is somewhat difficult to do with Tokio because round-robin is just not the way Tokio works. Tokio works by having tasks be notified when they become ready, at which point they go in a queue of things to run.
Yeah, that's more or less what I was suggesting, though I'm not sure I'd use the async/await stuff at all if I really wanted to do it like that. And yes, there isn't really any easy way to get work stealing with this approach.
Alternatively, maybe you should reconsider round robin polling?
I don't know how well this would work, but you could try spawning a timer task that basically looks like this:
loop {
sleep(200 ms).await;
notify_all();
}
Then all of your UDP users have their own task that wait for a notification from the timer task, and once they receive the notification, they all send out the UDP packets.
The notifications could be sent using tokio::sync::Notify or tokio::sync::broadcast or similar. Not sure which one would work best.
Note that here I am using a single timer rather than a timer in each task to "synchronize" the timings, since it sounds like you want to send out all the packets at once.
To tell you the truth, Tokio's work stealing runtime might not be very well optimized for this particular use-case. The thread-local queue has a fixed size of 256 tasks (the work stealing algorithm requires it to be a fixed size), and any tasks beyond that spill into the global queue, which I don't think is what you want performance-wise in your case.
If you space them out, then it is much less likely that you will run into issues with spilling into the global queue, since the 256 limit is on the number of notified tasks on a single thread. The idle threads don't count.