Tokio: round robin 50_000 async tasks fairly

We have 50_000 async tasks and a machine with 50+ threads.
Each async_task is of the form:

  1. read all available udp packets, up to 20 packets
  2. do some processing
  3. send some udp packets
  4. sleep for 200 milliseconds

Informally, we define 'fairness' as: every 'epoch', every one of the 50_000 async task runs once. No task is run more times than other tasks; no task is starved.

What is the best way to achieve 'fairness' here ?

rephrased:

we have 50_000 async tasks
we have 50 threads

we want to define a 'simulation tick' as:
for each of the 50_000 async tasks, run them until they hit their 'yield point'
do the above as efficiently as possible, using the 50 threads

This isn't the same as being absolutely fair.
Being fair, i'd reckon, can be done by selecting a task via round robin and assigning it to the first available thread.

This is somewhat difficult to do with Tokio because round-robin is just not the way Tokio works. Tokio works by having tasks be notified when they become ready, at which point they go in a queue of things to run.

Let S be the set of all fair solutions.

Then, from S, pick the most efficient one.

I have no idea what point you are driving at.

Is there a different async runtime that better matches this?

I'm not aware of any runtime that polls tasks in a round robin fashion as opposed to polling them when they notify their readiness.

You might want to try using mio directly?

Yeah, that's more or less what I was suggesting, though I'm not sure I'd use the async/await stuff at all if I really wanted to do it like that. And yes, there isn't really any easy way to get work stealing with this approach.

Alternatively, maybe you should reconsider round robin polling?

The XY problem is:

I want to be able to handle 50_000 concurrent udp users, and blast out updates to them at 200 millisecond intervals.

Round Robin is not a strict requirement, but I can't think of anything that satisfies the above without also being Round Robin.

1 Like

I don't know how well this would work, but you could try spawning a timer task that basically looks like this:

loop {
    sleep(200 ms).await;
    notify_all();
}

Then all of your UDP users have their own task that wait for a notification from the timer task, and once they receive the notification, they all send out the UDP packets.

The notifications could be sent using tokio::sync::Notify or tokio::sync::broadcast or similar. Not sure which one would work best.

Note that here I am using a single timer rather than a timer in each task to "synchronize" the timings, since it sounds like you want to send out all the packets at once.

To tell you the truth, Tokio's work stealing runtime might not be very well optimized for this particular use-case. The thread-local queue has a fixed size of 256 tasks (the work stealing algorithm requires it to be a fixed size), and any tasks beyond that spill into the global queue, which I don't think is what you want performance-wise in your case.

1 Like

I did not explain the problem well. It is fine if

task A sends out updates at k * 200
task B sends out updates at 100 + k * 200

bandwidth limits may actually require this

Thanks for the discussion. I'll just try the 'manual threading' approach.

If you space them out, then it is much less likely that you will run into issues with spilling into the global queue, since the 256 limit is on the number of notified tasks on a single thread. The idle threads don't count.