Faster than `RefCell<Box<T>>` for 'concurrent ptr swap'?

I want something that:

  1. is fast to deref (or pointer chase) and
  2. fast to swap (in a different thread)

Is there something better than RefCell<Box<T>> ?

Well the fastest is of course AtomicPtr in std::sync::atomic - Rust

Doing this with ownership can be a challenge, though, especially if shared. (This is one place where GC languages have a major advantage, though it does come at the cost of needing fencing around most operations in even good implementations, AFAIK.) Any chance you're making a binary and can get away with just statics for the things you need to swap between?

BTW, RefCell is !Sync, so it's definitely not what you want. Perhaps you just meant RwLock in std::sync - Rust?

1 Like

arc_swap - Rust , so far, sounds like close to what I want (but not read it in detail).

I'm missing something important. what is the alternative to 'making a binary' (making a library)?

XY problem: I have a large geo spatial structure (stored in an oct-tree like manner) and sometimes I need to update a leaf node, then propagate up the tree.

99.9% of the access is read; writes as infrequent. I am okay with slightly out of date reads. However, I want this to be multi threaded and low latency.

Sounds like you could go for a data-structure similar to the strategy that the im crate uses. You can build your entire tree using Arc pointers, having an arc-swap only at the root. Then to update a leaf, you rebuild the tree, reusing any parts of the tree unrelated to the leaf by cloning the Arc. This operation results in a new root node, which you swap out using arc-swap.

With this approach, every change requires creating a new Arc for every level above the leaf node, so its O(log n). Additionally, unlike approaches where you use arc-swap for every child node, this approach guarantees that you never see changes partially applied - you either see the full change, or no change at all.

Of course another option is to just rebuild the entire tree every time, storing each version of the tree in an Arc<Vec<Node>> with indexes in the vector instead of references/arcs. This would lead to much more expensive writes, but might be better for memory fragmentation reasons.

For both options, if you have multiple writers, you should protect the writers with a mutex. Otherwise you can lose a write if they both try to change the same old version.

3 Likes

@alice: Do you understand arc-swap internals? I don't. My current understanding goes something like this:

Mutex<T> : we can't even have concurrent readers
RwLock<T>: readers blocks writer
RwLocK<Arc<T>> : each reader locks, clones arc, then unlocks -- problem: contention over the ref count in the Arc

arc-swap claims to solve this, but it is not obvious to me how they solve it. Ideas ?

It is based on AtomicPtr that lets you atomically swap pointers, however the details of how exactly they do that is more complicated, since the naive approach runs into issues with cleanup.

I mean, an RwLock<Arc<T>> (or even Mutex<Arc<T>>) would probably serve most applications just fine.

This is what I am currently confused on:

  1. If arc-swap increments on every read, how is it any faster than RwLock<Arc<T>>

  2. If arc-swap does not increment on every read, when do we know when we can free the old value ?

Every time you get a new Arc<T> from the arc-swap, it has incremented the ref-count. And as for RwLock, well, arc-swap doesn't do any locking when writing the new value.

How is this faster than a RcLock<Arc<T>> where we read by

lock, clone, unlock, read using possibly out dated Arc

The lock, clone, unlock part might have to wait for a writer. This could take a while if the OS happens to deschedule the writer thread while it has locked the lock.

Additionally, locking and unlocking the rwlock probably involves a few more atomic integer operations than the arc-swap load, so you might save a few instructions there even if there's no writer. (I'm not completely sure about this one.)

The difference is probably larger for writers, which may have to wait for a while if there are many readers. If the lock is not fair, the writer could wait forever if the number of readers never drops to zero due to a constant stream of new readers. If the lock is fair, then the readers will wait a lot longer when a writer arrives, since they are in a queue and have to wait not only for the writer, but also for all of the readers in front of the writer.

1 Like

It doesn't. From my understanding, it's doing something similar to storing a pre-cloned Arc for each thread, so that reads don't usually touch the refcount.

Edit:

It actually has two APIs – load for short-lived accesses and load_full which returns a proper arc (this one obviously does touch the refcount). I was talking just about load.

Also, decided to push some benchmarks I've made for rwlock vs arc-swap some time ago. Maybe you'll find these useful. I'm only measuring read overhead btw.

1 Like