Why does the `store` operation in `Arc::make_mut` need `Release` memory order?

In the Arc::make_mut, the memory order of store operation is Release, see sync.rs - source. However, why does it need Release?

pub fn make_mut(this: &mut Self) -> &mut T {
  if this.inner().strong.compare_exchange(1, 0, Acquire, Relaxed).is_err() {  // #1
    // ...
  } else if this.inner().weak.load(Relaxed) != 1{
    // ...
  }else{
     this.inner().strong.store(1, Release);  // #2
  }
  unsafe { Self::get_mut_unchecked(this) }
}
// Drop
unsafe impl<#[may_dangle] T: ?Sized, A: Allocator> Drop for Arc<T, A> {
    fn drop(&mut self) {
        if self.inner().strong.fetch_sub(1, Release) != 1 {
            return;
        }
      acquire!(self.inner().strong);
     // deallocation
    }
}

The first condition of if at least guarantees that this object is unique when invoking the store operation at #2. Either the deallocation of the Arc object is sequence-after make_mut or the deallocation occurs in another thread. In the latter case, the Arc object(or copy) must be sent to another thread via synchronization facilities.

Moreover, the CAS at #1 uses the Acquire memory order, which guarantees that the other drops synchronize with #1 and #1 either sequenced before the drop that invokes deallocation at the current thread or synchronizes with the drop that invokes deallocation in another thread, which provides that all use of Arc happen-before the deallocation. The figure could be:

//thread 1
use(a0);
drop(a0); // no deallocation, fetch_sub_release_0
//thread 2
use(a1);
drop(a1); // no deallocation, fetch_sub_release_1
// ...

// thread N:
make_mut(a_n);  // CAS and store
// 1. either
// drop(a_n); deallocation here, fetch_sub_release_n
// or 2.
// send_to_other_thread(a_n/a_n.clone()); 

// another thread if send: 
let a = receive();
drop(a); // deallocation occurs here, fetch_sub_release_final

If the deallocation occurs at thread N, then the modification order should be

fetch_sub_release_0 < fetch_sub_release_1 < ... < CAS < Store < fetch_sub_release_n

any preceding fetch_sub_release_x synchronizes with CAS and CAS is sequenced before deallocation

If the deallocation occurs in another thread, then the modification order should be

fetch_sub_release_0 < fetch_sub_release_1< ... < CAS < Store < fetch_sub_release_n < fetch_sub_release_final

Since the Arc object invoked fetch_sub_release_final is sent by the synchronization facilities, hence CAS still happens before fetch_sub_release_final that invokes deallocation and any preceding fetch_sub_release_x synchronizes with CAS as well, so that the use(ax) still happens before the deallocation.

So, the memory order of the store operation can be Relaxed anyway. Is there any case I missed that requires the store to be Release?

2 Likes

It ensures that there are no weak references by synchronizing with the Acquire in Weak::upgrade. Without it, the relaxed load of the weak refcount would not be enough to ensure that.

Because not only the compiler but also the CPU reorders operations. The only way to establish a happens-before relationship between a previous acquire is a corresponding release ordering operation. Your reasoning is not what might happen without that relationship.

I highly recommend this book: Rust Atomics and Locks[Book]

Edit: This wasn't meant to be a reply to alice, but to OP. I keep klicking the wrong button.

1 Like

The key is which operation needs to establish a happens-before relationship with the store that performs a release operation here. In my question, I consider the deallocation, which seems not necessary to need the guarantee of the happens-before relationship with the store.

If there is another weak reference, the store operation cannot be invoked, isn't it? Note the second branch:

if this.inner().strong.compare_exchange(1, 0, Acquire, Relaxed).is_err(){
 //...
}
else if this.inner().weak.load(Relaxed) != 1{
  // ...
} else {
    // We were the sole reference of either kind; bump back up the
    // strong ref count.
   this.inner().strong.store(1, Release);
}

The comment has exposed that the store operation can only be invoked when weak==1 and strong==1.

Moreover, if there is a weak reference in another thread, and #2 is invoked, then the subsequent modification to the returned & mut T and the read to the upgraded Arc via the Weak::upgrade would be unordered.

Imagine that we have one strong and one weak reference and the following two threads:

// thread 1
let mut_ref = Arc::make_mut(&mut my_arc);
*mut_ref = 10;
// thread 2
let arc = weak.upgrade().unwrap();
drop(weak);
println!("{}", *arc);

Then we might have the order of operations on strong be like this:

// thread 1                thread 2
strong.cmpxchg(1, 0)                                  // 1 -> 0
strong.store(1, Relaxed)                              // 0 -> 1
                           strong.fetch_update()      // 1 -> 2

and the order of operations on weak be like this:

// thread 1                thread 2
                           weak.fetch_sub(1, Release) // 2 -> 1
weak.load(Relaxed)                                    // loads 1

With the above execution, thread 2 will read from the value in parallel with thread 1 writing to it, which is a data race and not legal.

If you make strong.store(1) into a release operation, then weak.load(Relaxed) sequenced-before strong.store(1, Relase) synchronizes-with strong.fetch_update() sequenced-before weak.fetch_sub(1, Release), so the ordering of operations on weak is not legal. But without the strong.store(1, Relase) synchronizes-with strong.fetch_update() there is nothing to prevent the above orderings.

5 Likes

First, how can the store operation be invoked when there is another weak reference in thread 2? Note the condition when the store operation is invoked

if this.inner().strong.compare_exchange(1, 0, Acquire, Relaxed).is_err(){
 //...
}
else if this.inner().weak.load(Relaxed) != 1{
  // ...
} else {
    // We were the sole reference of either kind; bump back up the
    // strong ref count.
   this.inner().strong.store(1, Release);
}

In your case, the weak.load(Relaxed)==2, see the comment in the source file. Second, even if the store operation is invoked with Release, it does not guarantee the subsequent modification to & mut T is ordered with the read that would result from weak.upgrade().

No, my example uses this order of operations for weak:

// thread 1                thread 2
                           weak.fetch_sub(1, Release) // 2 -> 1
weak.load(Relaxed)                                    // loads 1

so the load returns 1.

Let me try to understand what you're saying:

//thread 1                                    //thread2
strong.cmpxchg(1, 0,Acquire, Relaxed)  //#1
weak.load(Relaxed)  // #2
strong.store(1,Relaxed) // #3                   
                                        strong.fetch_update(Acquire, Relaxed, fn)// #4
                                        weak.fetch_sub(1, Release) // #5

You meant, since #3 is not a release operation, even though #4 reads the value written by #3, #3 does not synchronize with #4, So #5 and #2 is unordered, so there exists a case that #5 is coherence-ordered before #2 such that #2 reads 1, bypass the second branch and #3 is invoked, then #4 read 1 and produced a new Arc, the subsequent modification to the result of get_mut in thread 1 will be a data-race to the read to the produced Arc from upgrade in thread2.

If #3 is a release operation, #3 synchronizes with #4 if #3 were invoked, which makes #2 happen before #5, this implies #3 can only be invoked if thread 2 has no weak reference, otherwise #2 is invoked.

Is my understanding right?

I don't know why you're talking about coherence-ordering. That's only relevant in programs using SeqCst which is not the case here.

Ultimately, a common mistake is to think that all possible behaviors can be described by writing down all possible ways to interleave operations. But atomics are more powerful than that, and can behave in ways that do not correspond to any interleaving. This is called "not being sequentially consistent".

Because the load operation read a value written by a store operation can be said: The store coherence-ordered before the load, it does not only for single total order.

I ... think your understanding is right, but I'm not completely sure.

Yes, I knew the weird possible outcome by using atomic, I just didn't think out the example you gave, thanks for your example and explanation.

1 Like

After thinking more, this is proof of the assumption. Since the weak reference must be spawned from some Arc object, and the first condition(i.e. strong.cmpxchg(Acquire)) guarantees it synchronizes with any previous drop of the other Arc objects(including the one that spawns the weak reference, so when invoking the second condition(i.e. weak.load(Relaxed)), the result must not be 1 because the add to weak associated with the previous spawned weak reference is visible to this weak.load(Relaxed); the add to weak in downgrade is sequenced-before the drop of Arc spawning the weak reference and the drop synchronizes with strong.cmpxchg(Acquire)

Then, according to your explanation above, with the store with release memory order, assuming weak.upgrade() would successfully return an Arc object, then strong.fetch_update() thereof must read strong.store(1, Relase), otherwise strong.cmpxchg(Acquire) is err. The hypothetical synchronization ensures that weak.load happens before weak.fetch_sub(1, Release) such that the result must be a value before the modification of fetch_sub to weak, this paradox implies that the third branch will never invoked and weak.upgrade() always return None in this case.

But, I have another question: is a data race possible if the store is Relaxed for this case?

// thread 1
let mut_ref = Arc::make_mut(&mut my_arc);
*mut_ref = 10;
//thread 2
if let Some(arc) = weak.upgrade(){
   drop(weak);  
   println!("{}", *arc);
}

The simplified model would be:

weak = 2;
strong = 0;
// thread 1:
if weak.load(Relaxed) == 1{  // #1
   strong.store(1,Relaxed);  // #2
}
//thread 2:
if strong.load(Acquire) == 1{ // #3
    weak.store(1,Release);  // #4
}

Is it possible? It seems there is still a possible data race even if the intuition logic is that: #1 is true when #4 is invoked, #4 is invoked only when #3 is true, which means #1 is true such that #2 is invoked.

That example is exactly the same as the one I posted, so yes.

1 Like

I can understand the result from memory order, however, from the logic, consider this simplified example:

weak = 2;
strong = 0;
// thread 1:
if weak.load(Relaxed) == 1{  // #1
   strong.store(1,Relaxed);  // #2
}
//thread 2:
if strong.load(Acquire) == 1{ // #3
    weak.store(1,Release);  // #4
}

assuming #3 returns 1, that means #3 reads #2, #2 is invoked only if #1 reads #4, then #4 is invoked only #3 returns 1, which proof forms a cycle, which is counter-intuitive.

@alice

By looking at the C++ standard, the rules say

The example in p9 is much like this one

weak = 2;
strong = 0;
// thread 1:
if weak.load(Relaxed) == 1{  // #1
   strong.store(1,Relaxed);  // #2
}
//thread 2:
if strong.load(Acquire) == 1{ // #3
    weak.store(1,Release);  // #4
}

with x==weak and y==strong. The above-simplified code is the model of the make_mut and Weak::upgrade() we are talking about.

I don't think out-of-thin-air applies to this case. If you write thread 2 like this:

// thread 2
let maybe_arc = weak.upgrade();
drop(weak);
if let Some(arc) = maybe_arc {
    println!("{}", *arc);
}

then we will write 1 to weak no matter what. No circular logic required.

However, out-of-thin-air does apply to this case

// thread 1
let mut_ref = Arc::make_mut(&mut my_arc);
*mut_ref = 10;
//thread 2
if let Some(arc) = weak.upgrade(){
   drop(weak);  
   println!("{}", *arc);
}

weak.upgrade() is Some(arc) only if strong.fetch_update(Acquire, Relaxed, checked_increment) returns ok, which depends on the strong.store(1, Relaxed), which depends on weak.load(Relaxed)==1, which depends on drop(weak);(i.e. weak.fetch_sub(1, Release)), which depends on weak.upgrade()(i.e. strong.fetch_update(Acquire, Relaxed, checked_increment))

For your example in the above comment, the equivalent model might be:

weak = 2;
strong = 0;
// thread 1:
if weak.load(Relaxed) == 1{  // #1
   strong.store(1,Relaxed);  // #2
}
//thread 2:
if strong.load(Acquire) == 1{ // #3
     println!("produce arc");
}
weak.store(1,Release);  // #4

Yes, this breaks the cycle of dependency. The only possible that two conditions are true is that the implementation reorders #4 and #3 in the source code?

After researching something about OOTA. Actually, there still exists OOTA in your example:

// thread 1
let mut_ref = Arc::make_mut(&mut my_arc);
*mut_ref = 10;

// thread 2
let maybe_arc = weak.upgrade();
drop(weak);
if let Some(arc) = maybe_arc {
    println!("{}", *arc);
}

The weak.upgrade() will return None if it reads 0. Since this issue only focuses on the condition in this.inner().weak.load(Relaxed) != 1, the upgrade returns Some(Arc) only if the RMW operations thereof reads the value written by this.inner().strong.store(1, Release);, so the considered status should be weak_count == 2 and strong_count == 0 at this point. Assuming the store operation used Relaxed, the example is equivalent to the following:

// strong_count  == 0
// weak_count == 2
// thread 1:
if weak_count.load(Relaxed) == 1{  // #1
    strong.store(1, Relaxed);  // #2
}
// thread 2:
if weak.inner()?.strong_count.fetch_add(1,2, Acquire, Relaxed).is_ok(){  // #3
    // obtain arc
    drop(weak);  
}else{
   // None
   drop(weak);
}

#1 depends on drop(weak), drop(weak) depend on #3, #3 depends on #2, #2 depends on #1