How many (tokio) UDP-Sockets to use?

Suppose we have an highly asynchronous system that sends and receives a huge number of datagrams and it should be optimized for speed. The system is based on tokio (actually it is based on the actix actor model).

Now the (maybe) naive way to implement this, is to use a single tokio udp-socket for sending and receiving. For every incoming datagram we could spawn a new actor that process the datagram. Outgoing datagrams from actors are also send through the same socket.

However I'm not sure, whether or not this is the most effective approach?

Maybe its better to have two sockets (say binded to different ports). One, that handles only incoming datagrams and the other solely for the outgoing datagrams.

The reasoning here is, that this way, transmission might be faster, but I don't know if this makes sense, really. Is there even a rational answer to this? On the same reasoning one could argue, that $n$ sockets handling outgoing datagrams are better then one. Maybe one for each actor? However actors might come and go in and out of existence and creating a new socket each time might be slow.

I think the problem I have here is, that I don't know how to find an answer to this question, that is purely theoretical. Maybe the only way to find out is to implement all versions and make a benchmark test.

Have you tried running a benchmark to see the effect of using more sockets? I've found the criterion library to be super useful for comparing the same test run with different inputs.

If you're using actix then it should be quite trivial to spin up a different number of actors and test different algorithms.

No I haven't yet. I thought I first look for theoretical argument so I might be able to skip that step, since it takes additional time to test all possibilities. Therefore the question

From a purely theoretical / hypothetical perspective, the lowest overhead is likely to be having a socket per cpu core, and trying to keep all the work for each socket on that core - including in the kernel / IP stack - to minimise lock contention.

It's also the case that you're not likely to notice the difference between this and other options (that might fit more readily into Actix' scheduling model) until you hit fairly extreme levels of throughput and latency-sensitivity.

1 Like