I've been working on nexus-slab, part of the Nexus project - a collection of non-general-purpose but high-performance primitives for high performance systems.
The problem: Standard Vec-backed slabs have bimodal p999 latency. When capacity is exceeded, Vec reallocates and copies all existing data. At scale, this copy dominates - you'll see p999 jump from ~30 cycles to 2400+ cycles depending on whether a realloc lands in your measurement window.
The tradeoff:
Metric
nexus-slab
slab crate
p50
24 cycles
22 cycles
p99
26-28 cycles
24-42 cycles
p999
38-46 cycles
32-3700 cycles (bimodal)
max
~500-800K cycles
~1.2-2M cycles
You pay ~2 cycles on median to eliminate reallocation spikes.
How it works: Independent page-aligned slabs that grow without copying existing data. Two-level freelist (slab → slot) with LIFO reuse for cache locality.
This is probably NOT for you if:
Your slab stays "small"
Tail latency isn't a huge concern
The slab crate works fine for your use case
This might be for you if:
You're building order books, matching engines, session managers, or game servers
You've profiled and see reallocation spikes in your p99/p999
I think that since you never copy the values, you should lean into that and see if you can change the API to give out (Key, Pin<&T>) or something. Especially since it looks like your Key is 64-bit anyway, so if the value never needs to move you could (probably somehow) offer a version that gives out &'slab Ts, which would save some indirection and be a unique value, without taking more size cost.
I also wonder if you could structure the chunks differently in some smart way to avoid ever needing to copy even the small Vec<ChunkHandle>. Like imagine a DynamicSlab<T> is internally a [Option<Box<FixedSlab<256>>>, Option<Box<FixedSlab<512>>>, Option<Box<FixedSlab<1024>>>, ...] or similar, and you lean on a fast lzcnt for indexing.
From the description of the implementation, I also feel like it ought to separate out the "I don't like the allocator" part into a "nexus-allocator" of some sort, since it's not obvious to me that the things you're describing here should be the slab's responsibility, rather than just using a good allocator in the first place (one that takes advantage of sized deserialization, for example, which Rust allows it to do and thus not have to track that extra metadata you're mentioning).
Handle type with (Key, Pin<&T>) that auto-removes on drop - would need separate shared (refcounted) vs exclusive (mutable) types. This could be nice since we'd be safe splitting mutable borrows across slots here I believe.
Arena mode that disables removal, then &'slab T is safe
Curious what use case you had in mind?
Just want to confirm I'm following. You're suggesting tiered storage for the slab metadata, not the slots themselves?
Key stays (slab_idx: u32, slot_idx: u32)
slab_idx → lzcnt → tier → index within tier
Per-slab freelists unchanged, same LIFO behavior
Just changes how we store SlabMeta - zero copies ever
If so, that's neat. Our metadata copies should be small already, but "never copy anything" is a clean invariant to offer.
This is fair. The mmap stuff is really just for mlock and explicit huge pages (THP compaction can spike latency, so some shops disable it). I could default to std::alloc and feature-gate the raw mmap path for power users who need those guarantees if that would be more consistent with ecosystem standards.