What's wrong in this atomics example

use std::sync::atomic::Ordering::*;
use std::{
    sync::atomic::{AtomicBool, AtomicUsize},
    thread::spawn,
};

fn main() {
    let x: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
    let y: &'static _ = Box::leak(Box::new(AtomicBool::new(false)));
    let z: &'static _ = Box::leak(Box::new(AtomicUsize::new(0)));

    let _tx = spawn(|| {
        x.store(true, Release);
    });
    let _ty = spawn(|| {
        y.store(true, Release);
    });
    let t1 = spawn(move || {
        while !x.load(Acquire) {}
        if y.load(Acquire) {
            z.fetch_add(1, Relaxed);
        }
    });
    let t2 = spawn(move || {
        while !y.load(Acquire) {}
        if x.load(Acquire) {
            z.fetch_add(1, Relaxed);
        }
    });
    t1.join().unwrap();
    t2.join().unwrap();
    let _z = z.load(SeqCst);
}

in this example can someone please layout a possible execution for z holding a 0?
Also, is it possible for x.load to return false in t2 even when tx executed successfully and stored a true?

For the second question, unless you've established that the execution of tx happens-before x.load in t2, then yes, that would be possible. Since your code doesn't do that, then it'd be possible. (However, .join() should internally use sufficient atomic synchronization to establish a happens-before relationship between the execution of the thread and anything coming after .join().)

For the first question, note that you establish that x.store(true, Release); happens-before the y.load(Acquire) in t1 and y.store(true, Release); happens-before the x.load(Acquire) in t2.

However, nothing establishes that (say) "either y.load(Acquire) in t1 happens-before the x.load(Acquire) in t2, or vice versa", or any other similar condition that would allow you to conclude "either y.store(true, Release); happens-before the y.load(Acquire) in t1 or x.store(true, Release); happens-before the x.load(Acquire) in t2".

Therefore, it'd be possible in the abstract model for y.store(true, Release); to not be visible to y.load(Acquire) in t1 and for x.store(true, Release); to not be visible to x.load(Acquire) in t2. That scenario would lead to _z == 0 at the end of main.

Note that even though you do a SeqCst load of z, there's only guaranteed to be a total modification order of SeqCst operations; a SeqCst somewhere in the code doesn't force everything to be SeqCst. I'm fairly sure that you need SeqCst on the operations on x and y, and Relaxed would actually suffice for the ordering on z.load(_). (I think it would be alright for the two stores to be Relaxed and the four loads of x and y to be SeqCst, but don't trust me on that. I'm not familiar enough with SeqCst.)

For a concrete-ish example, you could imagine that you have four cores, each one executing a spawned thread, and it takes longer for atomic stores to reach further-away cores (and they won't wait for other cores' operations to be visible unless you tell them to):

|  C1  |  C2  |  C3  |  C4  |
| _tx  |  t1  |  t2  | _ty  |

We can imagine that the sequence of events is:

  • x.store(true, Release) and y.store(true, Release) occur in C1 and C4
  • The store to x propagates to C2, and the store to y propagates to C3
  • The while-loop in t1 exits, and sees that y.load(Acquire) is false. Similar for t2.
  • Finally, the stores to x and y finish propagating to all cores.

I'm just mentioning that scenario in case it's more intuitive to you; when it comes to writing correct atomic code, I think reasoning about concrete machines won't help as much as reasoning about the formal model.

3 Likes

In a cache coherent system we are guaranteed "write propagation" and "transaction serialization" for every single memory location which is being shared among peer caches but we don't know when would these writes actually propagate to other peer caches which could be a problem, is that correct?

also, if that's true then can I conclude that we are not guaranteed to be in a cache coherent state every time?

is this the only scenario which would lead to _z == 0 at the end of main?

also, thanks a lot for this detailed explanation!

Acquire-Release pair can establish synchronization(aka. happens-before) for operations on the same atomic variable; it does not provide a global ordering across different atomic variable(aka. Sequentially-consistent ordering).

In other words, if two threads simultaneously write to two different atomic variables, then on weakly ordered architectures, the visibility (propagation) order of those writes to other cores is not globally guaranteed. Different cores may observe the two writes in different orders.

Therefore, for such different atomic variables, global consistency via SeqCst is required to ensure all cores observe the writes in the same order.

Moreover, the global consistency guarantee of SeqCst is only valid for operations that also use SeqCst(see Ordering::SeqCst and memory_order_seq_cst).

1 Like

so visibility is not the problem here? Is the problem actually order of propagation?

Yes. I am not proficient in details but as far as I know, this order may be related to how the CPU hardware aggressively optimizes the efficiency of read and write instructions (and the physical length of the CPU data bus maybe), and cannot be guaranteed. So in such scenarios generally, SecCst is required to enforce global consistency in visibility order. Correspondingly, its cost will also become very expensive.

could you please provide an execution which demonstrates how different order of propagation could lead to a 0? thanks!

The thing about atomics is that you might see an old value, unless there is synchronization that prevents it.

So you may just have this execution:

x = true; // Release
y = true; // Release

// t1
x.load(Acquire) == true
y.load(Acquire) == false // reads old value

// t2
y.load(Acquire) == true
x.load(Acquire) == false // reads old value
4 Likes

I think it is almost impossible to write a stable and reproducible execution. Because this kind of out of order visibility usually only occurs in extreme situations that are difficult to trigger and requires running on specific hardware architectures, which I have not actually experienced.

Setting aside those issues, the abstraction of memory order in the memory model masks the underlying hardware details.
In concept, the Acquire and Release patterns only guarantee visibility issues for the same atomic variables, and are completely independent of the visibility order (sequential consistency issue) between atomic variables with different addresses, even though there may actually be sequentially-consistent ordering in some actual hardware.

Conceptually, only the SeqCst can ensure that the visibility order of multi-core variables between atomic variables (using the same order) is completely consistent, and the compiler has already taken care of everything for us at compile time.

So I think the execution does not matter, and it's also difficult to test.

1 Like

Run with loom

Although it could do with a maintainer to deal with other bugs.
This pull, although I haven't tried it, may help in some cases.
I think your example won't be impacted though.

1 Like

could you please provide a concrete definition for this, perhaps my understanding of this is incorrect and a source of confusion. thanks!
Are walking about visibility of the values which the atomic variable takes overtime?

According to my personal understanding, different cores have their own independent caches, so a write operation from one core needs to be propagated to other cores.

Even if the write operations of different cores do have a temporal order, if they are written to different address spaces, they are considered unrelated by default. Therefore, the hardware does not need to guarantee that the speed and order of data propagation from the writing core to other cores are completely consistent, as this would reduce efficiency.

I believe this involves hardware design principles, such as bus conflicts and arbitration( only a bit of lingering memory from the course), or whatever else. As for more in-depth and detailed content, I don't know much about it either. But I hope these can help you

And just found something typically from Lamport:

The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

1 Like

thanks a lot!