I'm currently trying to rustify my synthesizer whose current implementation is single threaded. My targeted design requires a bunch of heterogeneous long-living threads to synchronously start processing a common input chunk of data. Their results may be collected asynchronously. The next chunk is formed based on the collected results and the loop starts again.
You can think of the whole mechanism to be a bunch of workers waiting for a global repetitive event or clock signal to start their next iteration of work.
Each round will have to complete within 5 µs (for a sampling rate of 192000 Hz). Each input chunk depends on the result of all threads, so there's no possibility to buffer a bunch of results in order to reduce the synchronization calls.
The threads' computations by themselves are pretty cheap but they might be all different between the threads so work stealing or thread pools are not an option (afaik).
I'm just digging through std::sync
to fill my toolbox, but I have no idea how these synchronization primitives might be typically backed by CPU features to guess their impact on latency and thus on the possibility to use them for near real-time programming.
So far I've chosen to use std::sync::mpsc
to asynchronously collect the results after each round, which I expect to be fast enough.
For synchronously starting/waking/notifying the threads to start their computation I don't really know what to use. I had the following ideas so far:
-
Using a
std::sync::Barrier
with a capacity ofthread_count + 1
. The control thread will invokewait()
to unlock all computation threads. This sounds easy but I have no idea how to dynamically add or remove further threads during run-time without having to re-initialize the entire thread collection. -
I don't know if I understood
std::sync::Condvar
right. I thought aboutwait()
ing for a condition in all threads and to callnotify_all()
from the control thread. -
A spinlock would achieve the desired latency, but there might exist some dozens of threads at a time which would overstress the CPU. Is there some form of a slow spinlock that gives the CPU some hundred cycles each round to gasp for air?
-
One
std::sync::mpsc::channel
orstd::sync::mpsc::sync_channel
per thread in a loop that wakes up all threads sequentially (quite ugly, wasteful and possibly slow)
I might try to group several computations to form fewer but bigger threads. But the code complexity would be drastically increased and a reconfiguration at run-time would introduce a hard to grasp timing behavior while the benefits are questionable.
I'd like to hear from your experience.