Arc downgrade - why is alloc_ref_count spin loop needed? (Atomics and Locks - Chapter 6)

Hello! I am looking just for some clarification on part of the downgrade method on the Arc type.

Towards the end of Chapter 6 of Mara Bos' Brilliant 'Rust Atomics and Locks' book, the final optimization on the handling of Arc and Weak pointer types includes adding a spin loop on the downgrade method. This involves checking whether the alloc_ref_count has been 'locked' i.e. set usize::MAX:

    pub fn downgrade(arc: &Self) -> Weak<T> {
        let mut n = arc.data().alloc_ref_count.load(Relaxed);
        loop {
            if n == usize::MAX {
                std::hint::spin_loop();
                n = arc.data().alloc_ref_count.load(Relaxed);
                continue;
            }
            assert!(n <= usize::MAX / 2);
            // Acquire synchronises with get_mut's release-store.
            if let Err(e) =
                arc.data()
                    .alloc_ref_count
                    .compare_exchange_weak(n, n + 1, Acquire, Relaxed)
            {
                n = e;
                continue;
            }
            return Weak { ptr: arc.ptr };
        }
    }

So far as I can see, alloc_ref_count is only set to usize::MAX in get_mut.

I'm a little unclear on why this check in downgrade is needed, as my understanding of get_mut is that if arc.data().alloc_ref_count.compare_exchange(1, usize::MAX, Acquire, Relaxed) succeeds, then we are guaranteed to have the only Arc around, so no other threads would be able to call downgrade I wouldn't have thought?

    pub fn get_mut(arc: &mut Self) -> Option<&mut T> {
        // Acquire matches Weak::drop's Release decrement, to make sure any
        // upgraded pointers are visible in the next data_ref_count.load.
        if arc.data().alloc_ref_count.compare_exchange(
            1, usize::MAX, Acquire, Relaxed
        ).is_err() {
            return None;
        }
        let is_unique = arc.data().data_ref_count.load(Relaxed) == 1;
        // Release matches Acquire increment in `downgrade`, to make sure any
        // changes to the data_ref_count that come after `downgrade` don't
        // change the is_unique result above.
        arc.data().alloc_ref_count.store(1, Release);
        if !is_unique {
            return None;
        }
        // Acquire to match Arc::drop's Release decrement, to make sure nothing
        // else is accessing the data.
        fence(Acquire);
        unsafe { Some(&mut *arc.data().data.get()) }
    }

Any idea what I'm missing in my understanding?

Two threads could be running get_mut() and downgrade() concurrently.

Let's say Thread 1 runs get_mut() and confirms that there is only one Arc by loading the value from alloc_ref_count. It doesn't use any locking mechanism (i.e. it doesn't set alloc_ref_count to usize::MAX).

Now Thread 2 starts running and doesn't spin on a usize::MAX lock. It goes straight on to increment alloc_ref_count by one, which it can do because there's no lock on it. Now we have a data race because Thread 1 may think there is only one Arc and no Weak pointers left, when in fact we have a new Weak pointer created by the downgrade().

As Chapter 6 of the book says, this is unlikely to happen but we do need to prevent it defensively:

the downgrade method could just spin until it’s unlocked, in the unlikely situation it runs at the exact same moment as get_mut.

In the optimized implementation alloc_ref_count is basically n_weak + if n_arc != 0 { 1 } else { 0 }. Since you're holding an Arc you know that the latter term is 1, so if the whole alloc_ref_count is 1 you know there are no Weaks, but there could still be as many Arcs as you want.

Imagine then that there are two Arcs and no Weaks. This check passes because there are no Weaks. Then before you perform the check that you're the only Arc, the second Arc gets downgraded to a Weak (and then it needs to be dropped, because downgrade doesn't consume the Arc). So now you have one Arc and one Weak, and get_mut is checking whether it is the only Arc, which is true, so it succeeds. However there is still a Weak that could be upgraded to an Arc and thus gain access to the data you just gave out a mutable reference to.

Since you can't atomically check both counters at the same time you need to somehow prevent one of the operations that could skrew your observations. The most reasonable is downgrade, so alloc_ref_count is "locked" to prevent creating new Weaks in downgrade until data_ref_count is acquired.

6 Likes

Ah of course that makes sense - thank you!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.