For the following snippet, is that important to load the data using Acquire ordering if compare_exchange fails?
// The snippet taken from Mara's book on atomics & locks
fn get_data() -> &'static i32 {
// Lazy initialization
static DATA: AtomicPtr<i32> = AtomicPtr::new(std::ptr::null_mut());
let mut p = DATA.load(Acquire);
if p.is_null() {
p = Box::into_raw(Box::new(42));
if let Err(e) = DATA.compare_exchange(std::ptr::null_mut(), p, Release, Acquire) {
// Does the `Acquire` ordering matter here? We already know that the data
// has been initialized, so we just need to load it (?)
drop(unsafe { Box::from_raw(p) });
p = e;
}
}
unsafe { &*p }
}
Yes, the Acquire ordering is needed to ensure that the read of the i32 (not the pointer, but the pointed value!) made by the current thread is synchronized with its initial write made by the thread to initialize it.
I kinda get it, but I mean: wouldn't it be synced with Relaxed ordering? Because we know that the comparison failed (so the pointer is not null), and the write has already been made before.
Let's assume that there are 2 threads. Thread 1 initializes DATA and in thread 2, comparison fails because DATA has already been initialized by thread 1. Now thread 2 will load 'DATA'. In order to make sure that the initialization of DATA in thread 1 happens before the loading of DATA in thread 2, Acquire is needed. If Acquire is not used, we can't assume that the p in the final unsafe block is 100% initialized. This is my understanding.
No, atomics don't work like this. Even though another thread might seem to have written the 42 before writing the pointer to DATA, this won't guarantee that another thread will observe them in the same order. This means that the thread loading DATA with a Relaxed ordering might observe the write of 42 after the write to DATA, at which point however it might have already tried reading the value from the pointer and caused a data race. To actually guarantee that what was written before the write to DATA is observed by those that read DATA later you need a Acquire-Release pair.
In principle, a write of a pointer and a write of data through that pointer are entirely different events, and need to be independently synchronized. The CPU could synchronize one but not the other between the cores, so different cores would see inconsistent data. In practice, read/write through a pointer is considered a data dependency with the pointer itself, so no sane or modern CPU would ever do that (I believe only DEC Alpha did otherwise, and it's long gone). Data dependencies are implicitly synchronized.
However, modern compilers don't understand data dependencies. What you really want to use for the failure ordering is memory_order::consume, which specifies that data dependencies are preserved. Unfortunately, modern compilers don't implement consume ordering, and instead replace it with Acquire or Release, depending on whether it's a load or a store. For this reason Rust doesn't have Order::Consume, and you need to explicitly specify the Acquire ordering.
So, If some thread A compare_exchanges the pointer from null to &42, and later on thread B fails to compare_exchange the pointer, it would try to load the value using Relaxed ordering, it could still load the null pointer, even though the fact of the write was observed (so that is, the write to pointer is delayed) ?
executes the compare_exchange and fails, loading p from DATA
thread B creates a reference from p, which counts as a read from *p
The problem is that this is a data race. You might be inclined to think that:
event 2 comes before event 3 in thread A
thread B observed 3.
hence thread B must observe 2.
But unfortunately this is false, Relaxed operations are only guaranteed to operate atomically on the memory location they act on, and won't guarantee observability of other events, even the Relaxed operation happens to observe events that in other threads are guaranteed to come after those other events.
In this case this means that thread B is not guaranteed to see 2, even though from thread A's point of view 2 happens before 3 and thread B observed event 3.
Hence you end up with a write to *p from thread A, and a read from *p in thread B, without a guaranteed order between them, and this is a data race.
The point of Acquire and Release is to avoid exactly this problem. A pair made by an Acquire load on a memory location on which a Release store operated guarantees that anything that comes after the Acquire load in the thread that performed it is guaranteed to see everything that happened before the Release store in the thread that performed it.
In this case it means that is the compare_exchage fail case was an Acquire then when thread B executes it it would synchronize with the Release in thread A, and event 5 would be guaranteed to see the write in even 2.
Why does Relaxed exist then? Sometimes you only need atomicity on a single variance and it doesn't matter what happens to the rest of the memory. In those cases Relaxed would be more efficient than Acquire/ Release`.