Idea for an append-only vec that is always sync

I've just sketched out a data structure I've been thinking about and am curious to hear thoughts (and see errors). It's a kind of Vec that can only be appended to, with elements that are never moved or modified, so the push method can take a &self parameter (with appropriate use of atomics). I think it could be very helpful for cases where you want implement an arena that can be indexed extremely rapidly while simultaneously allowing values to be pushed.

It's only 107 lines of code (counting 42 lines of doc comments, and 63 lines of initializing an array with null pointers, but excluding tests).

Known issue: it will fail to compile on non-64-bit architectures.

Thoughts?

I just realized I also forgot to implement Drop, but that should be pretty easy.

Comments, from most important to least important:

  • This looks wrong:
    unsafe {
        *(ptr.add(offset)) = val;
    }
    
    The previous value pointed by the pointer is not initialized, so it looks like this calls drop() on an uninitialized value. You can verify this by running the code under Miri, using a Drop type (e.g. String). Shouldn't you be using ptr::write() instead?
  • There is no explanation for the memory orderings applied. In particular, how do you justify Relaxed at the places you use it?
    • Relatedly, I don't understand (and it's not explained in comments) how three separate atomic variables can ensure that the entire data structure is atomic. In the spin loop at the end of push(), you check that the value is not idx anymore, but it is not verified what its value is. It seems to me this has a data race in the presence of at least 3 threads, as in that case, self.count > idx can occur in more than one way.
  • There is room for improvement in the iterator code in terms of performance:
    pub fn iter(&self) -> impl Iterator<Item = &T> {
        (0..self.len()).map(|i| &self[i])
    }
    
    This performs an atomic load and re-computes the index/offset upon each iteration. This would better be expressed as an iterator over iterators over self.data, flattened.
3 Likes

That spin loop is the only place that modifies self.count, and it's a compare-and-swap from idx to idx+1. This effectively serializes all of the pushes, as any thread wanting to set count to 𝓃 will wait for some other thread to set it to 𝓃-1 first— Whatever order the threads happen to get reservations in is also the order in which they will change count.

1 Like

The pushes are sequentialized anyways with your spin_lock loop. The only thing that can happen "in parallel" is the actual copying of the value which - for small size_of::<T> is not necessarily worth all that much.

So I would think that using some Mutex<()> to guard write access might be better. (You wouldn't need the reserved value anymore, just increment count at the end of the push before releasing the Mutex and only read count in a push operation after the mutex was acquired.

This would also avoid the problem that, as things stand, there could be many redundant allocations if allocating an new array takes a while and other threads come up to do the same thing.

If you don't want a mutex around the whole push operation, there's other approaches, e. g. only the first operation (the one with offset 0) should allocate, the other ones would wait for the pointer to become non-null. Or you could use a Once per bin for doing its allocation. The spin_lock in the and could probably be replaced by a Condvar. In either case note that I'm not 100% certain how mutexes or Once interact with atomics, especially if those aren't SeqCst. Use primitives from parking_lot if you still want it have a few cycles of spinning before some actual blocking mechanisms are involved.

2 Likes

An alternative would be to move the sync point into the indexing operation: Store an AtomicBool for each item to say whether or not it's ready, and have index walk the count forward, waiting if necessary.

Tokio has a very similar data structure that it uses to implement work stealing. You can find it here: src/runtime/queue.rs

There's also another similar data structure in its mpsc channel. You can find it here: src/sync/mpsc/block.rs

4 Likes

Thanks so much for the feedback everyone! I've implemented the bugfixes suggested by @H2CO3, hopefully correctly, including comments on the memory orderings, some of which I changed. So I'd love to hear if I've messed up any of those.

I haven't changed the overall locking scheme. My feeling is that as @steffahn pointed out, there isn't much parallel work happening during parallel pushing, but by the same token, there isn't likely to be much contention except in the case where a new array needs to be allocated. That case is slow, but also happens a very limited number of times, so I'm not too worried about optimizing for it. I was most concerned with optimizing the read case.

I'll definitely ponder other locking schemes for push.

I haven't optimized the iterator (or the Drop, which is similarly sloppy). In my current imagined use case those aren't critical for speed, and because it's hidden in an impl Iterator return value, I don't think there is anything preventing future optimization when I have time.

Thanks again, everyone!

There's already Aovec — Rust library // Lib.rs which is sadly unmaintained. If you can make something similar but without its problems, that would be great !

Thanks for the link! Is that even it's GitHub repository has vanished, but it's clearly got the same goal and a very similar implementation. It does look to be unsound, since it implements Sync regardless of whether the type T is Sync.

I think I'll publish, then! :slight_smile:

1 Like

Couldn't you make a MyOtherAtomicContainer<u32> ?

I'm not sure what you're suggesting. Could you explain in more detail what you mean?

Forget about that. I was just wondering why you had to have T: Sync if you just provide a shared ref to T ?

Edit: what was I thinking ?

Because that's exactly what Sync means. It means "this value is safe to access from multiple threads". It's especially important in the presence of unsafe and/or interior mutability. If you were allowed to access a !Sync type from multiple threads, then you could as well put in a RefCell, and borrow_mut() from multiple threads without synchronization. That would be unsound.

Of course, my bad.

Here’s a soundness issue:

Since push is a &self method, you’ll need a T: Send + Sync bound on the implementation of Sync for AppendOnlyVec<T>. (The pushed values will be destructed on the thread that owns the AppendOnlyVec<T> while another thread can push them, so you do send data between threads when calling push.)

4 Likes

Thanks @steffahn! Somehow I was thinking that Sync implied Send.

The possible source of confusion is the fact that T: Sync iff &T: Send, i.e. these traits are indeed connected, just not so directly.

I've now published a fix to this soundness issue. Thanks again for identifying it!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.