Hey everyone!
I've been working on a project that uses threads heavily, which is starting to become problematic. I'm wondering if anyone has any insight on better ways to structure the program, perhaps with an async executor. I know the conventional wisdom is to use async for IO-bound tasks, but I'm not quite sure if that's the case here.
The short summary is that this program processes real-time data from a radio telescope. I'm capturing packets at 10 Gbps using libpcap, decoding them into raw complex spectrum data, reordering to fix out-of-order packets, averaging in time, and doing other middleware-esque processing before sending it out to an ML classifier via IPC, eventually at a nominal 2 Gbps data rate on the other side. All these tasks are currently threads connected with crossbeam channels. In addition, I'm collecting statistics on channel backpressure, raw data distribution, timing, and holding on to a large ring buffer that is dumped to disk on a UDP trigger. The statistics are shipped to a prometheus database with a hyper webserver. Lots going on!
To be more concrete, I'm receiving 8kb of data every 8us, and the total runtime of all the processing is more than 8us, implying something has to be pipelined and processed in parallel. Things would be easier to put into something like Rayon if these tasks weren't stateful. However, tasks like downsampling must have state in the sense that I only produce data on the outgoing channel for every N input chunks. I have some tasks that are pure functions, some that are downsampling, and some that produce more data than comes in. To make things even more complicated, I have to control thread affinity and pin tasks to cores that share a NUMA node, otherwise I drop packets.
This is all well and good, and honestly is running fine. However, I'm underutilizing the cores and using too many. I look at htop and each thread is only at ~70% load. When benchmarking, my tasks are, on average, 2-3us. I would put multiple together, but that makes reasoning about the program structure harder. As in, if I want to add an additional processing step in the middle of a merged task, that's harder than if they were two separate tasks with channels connecting them.
So, my question is: is it possible to build out this kind of stateful-pipelined program with async rust? I thought that work-stealing helps in the sense that I can utilize the cores better, perhaps fewer of them, while still maintaining this chained-together model. I have difficulty finding references to something like this with infinite-runtime processing loops chained together in an async context.
If you want to see what I have so far - I'm working on it here.
Thanks for your help!