So it turns out I don't understand C11's memory model at all, after my colleague showed me this snippet:
fn main() {}
#[test]
fn foo() {
use loom::{
sync::{Arc, atomic::*},
thread,
};
loom::model(|| {
let v = Arc::new(AtomicU8::new(100));
let reset = {
let v = Arc::clone(&v);
thread::spawn(move || {
v.store(0, Ordering::Relaxed);
})
};
let incre = {
let v = Arc::clone(&v);
thread::spawn(move || {
v.fetch_add(1, Ordering::Relaxed);
})
};
reset.join().unwrap();
incre.join().unwrap();
// fails, meaning it's possible to end up with 101
assert_ne!(v.load(Ordering::Relaxed), 101);
});
}
From what I understand, there's only two possible modification order for the shared atomic: (notations and graphs come from here)
IMHO these two graphs suffice to say that the only possible outcomes are either 0 or 1: 101 exists in MO of v only in the second scenario, and coherence means we should not see 101 since
following the reverse direction of arrows, then for each one of those, take a single step across every line that connects to the relevant memory location. It is not allowed for the access to read or write any value that appears above any one of these points."
The JoinHandle::join shall grant us required arrow (which I didn't draw): its return synchronizes-with all memory operations occurred in the spawned thread.
The join operation only guarantees the happens-before relation between the main thread and the reset thread. Since you use the Relaxed order in both threads, there is no happens-before relation between the reset and incre threads. Therefore, the incre thread could use an old value from its cache for the fetch_add operation.
I played around with the loom code and loom fails with an internal assertion if I replace loom's Arc with std's Arc to get rid of some noise in the log. I hit the same problem if I leak a Box. Seems to be a bug in loom.
Below you find looms log. Maybe this helps you to understand what happens. I think the relevant thing are the branch_action , synchronize and store/rmw operations of which I do not know the semantics.
The shortened version of the full log looks like this
No, it is atomic. Atomic and cache coherence are two different concepts:
Atomic: An atomic operation is a single, indivisible micro instruction. When a thread executes fetch_add, it's guaranteed that the read, modify, and write of the value happen as a single, uninterrupted unit. No other thread can observe a partial state of the operation.
Cache Coherence: This is about making sure that all processors in a system see the same, up-to-date value for a given memory location. Thanks to protocols like MESI, a value written by CPU A will eventually be visible to CPU B.
However, cache coherence doesn't guarantee the order in which other memory operations become visible.
This is where memory ordering comes in. The fetch_add operation is atomic, but its interaction with other, non-atomic memory accesses might not be what you expect without a specified memory order.
For example, you might encounter a scenario like this:(You may reproduce in Mac or other arm devices)
use std::{
hint,
sync::{
Arc,
atomic::{AtomicBool, Ordering},
},
thread,
};
static mut x: i32 = 0;
fn main() {
for i in 1..10000 {
unsafe { x = 0 };
let flag = Arc::new(AtomicBool::new(false));
let t1_flag = Arc::clone(&flag);
let t2_flag = Arc::clone(&flag);
let t2 = thread::spawn(move || {
while t2_flag.load(Ordering::Relaxed) == false {
hint::spin_loop();
}
assert!(unsafe { x } == i)
});
let t1 = thread::spawn(move || {
unsafe {
x = i;
}
t1_flag.store(true, Ordering::Relaxed);
});
t1.join().unwrap();
t2.join().unwrap();
}
}
You should use release-acquire here.Without the acquire and release memory orders, even if flag is updated atomically, there's no guarantee that Thread 2 will see the write to x (which happened before the flag was set) once it sees the flag has been updated. This is because the CPU is allowed to reorder these non-atomic memory accesses.
For OP's problem, It's about synchronization between reset thread and main thread, incre thread and main thread. But for reset thread and incre thread there is no guarantee in the memory model of c++20.
I came across Rust Atomics and Locks , and it's a good resource for this. It’s where I picked up pretty much thing I know about atomics.
Based on my understanding, in the original snippet (where all operations are Relaxed), yes, the threads incre and reset don't synchronize-with each other. But this should not matter, since threads should agree on the modification order of the atomic, and coherence rule means the abstract machine should outright ban the read of 101 here: for any possible MO of v (of which there's just 2 possible cases, if I'm not mistaken), either 101 is not in the MO in the first place (the first graph), or the arrow (happens-before) provided by JoinHandle::join helps ruling it out (the second graph).
I'm not yet familiar with MOESI/MESI (a long time resident on my study-later list...), but as we're writing for the C++11/C11 abstract machine, these hardware implementation details should not matter that much (I guess it's two different terms at play here, hardware cache coherence protocols vs coherence rules of the abstract machine).
My guess from slightly looking at loom code is that; it does not handle operator atomicity correctly.
Atomicity is saying; add-value always immediately succeeds the fetch-value in modification order.
edit: I could not resist adding confirmation to my suspicion and ruling out some other possibilities (bugs with join or ordering.) so went and coded it as;
const O: Ordering = Ordering::Relaxed; // SeqCst same result
loom::model(move || {
let a = Arc::new(AtomicU32::new(100));
let b = a.clone();
let c = a.clone();
thread::spawn(move || {
b.store(0, O);
});
thread::spawn(move || {
c.fetch_add(1, O);
});
let x = a.load(O);
let y = a.load(O);
let z = a.load(O);
if x == 100 && y == 0 && z == 101 {
panic!("missing atomicity");
}
});