Extracted [0] from omq.rs into its own crate: a bounded SPSC ring buffer that uses the ypipe batching trick from ZeroMQ.
The key: instead of two atomics per item (one on push, one on pop), you batch writes and do one Release store on flush, one Acquire load on prefetch. Under load this amortizes synchronization cost to near zero.
Three pointers: head (consumer), tail (producer, non-atomic), flush (the only shared state). Push and pop are zero-atomic operations. One atomic per batch on each side.
Perf: 760M items/s between threads (u64, batch of 256, capacity 1024), measured in a Linux VM on an old Mac mini from 2018.
In omq.rs, using yring for cross-thread inproc delivery pushed throughput from ~3M msg/s to ~16M msg/s.
let (mut producer, mut consumer) = yring::spsc(1024);
for i in 0..100 {
producer.push(i).unwrap();
}
producer.flush(); // one Release store
consumer.prefetch(); // one Acquire load
while let Some(val) = consumer.pop() {
// process val
}