Yring: bounded SPSC ring with ypipe-style batched flush

Extracted [0] from omq.rs into its own crate: a bounded SPSC ring buffer that uses the ypipe batching trick from ZeroMQ.

The key: instead of two atomics per item (one on push, one on pop), you batch writes and do one Release store on flush, one Acquire load on prefetch. Under load this amortizes synchronization cost to near zero.

Three pointers: head (consumer), tail (producer, non-atomic), flush (the only shared state). Push and pop are zero-atomic operations. One atomic per batch on each side.

Perf: 760M items/s between threads (u64, batch of 256, capacity 1024), measured in a Linux VM on an old Mac mini from 2018.

In omq.rs, using yring for cross-thread inproc delivery pushed throughput from ~3M msg/s to ~16M msg/s.

let (mut producer, mut consumer) = yring::spsc(1024);

for i in 0..100 {
    producer.push(i).unwrap();
}
producer.flush(); // one Release store

consumer.prefetch(); // one Acquire load
while let Some(val) = consumer.pop() {
    // process val
}

[0] crates.io: Rust Package Registry

this is an interesting alternative to bip buffer. I've used bbqueue, a rust bip buffer implementation, for a proxy-like program. I'll take a look at yring later.

guessing from the posted sample snippet, I suppose yring is only message based, it doesn't support unstructured data, i.e. arbitrary byte streams, right? (well, I can use u8 as the message type, but I'm talking about a buffer api that allows the producer to perform in-place algorithms). but still this is a good to have library in the toolbox.

It's generic over any T: Send items, so not message-specific. It's slot-based, not a byte-stream buffer.