Happens-Before Relationship by Relaxed memory ordering in single and multiple threads

  • Relaxed ordering doesn't ensure any happens-before relationship.
  • Operations in a single thread are always subject to the happens-before relationship as defined by program source order.
  • The only things that Relaxed ordering guarantees is that the modification of a variable indeed happens atomically, i.e. there is no tearing, and for more complex compare-and-swap or fetch-modify operations the sequence really happens atomically, with nothing interfering in the middle (e.g. between a read and modification).
  • On most (all? I'm not sure) processors, aligned loads and stores of primitive processor types are always atomic. In particular, AtomicU32::load(Relaxed) on x86/x86-64/aarch64 is just a simple load, and similarly AtomicU32::store(Relaxed) is just a simple store, like with any other variable in your code.
  • You misunderstand what happens-before means. It doesn't mean that the compiler will literally output operations in the same order. It just means that the compiler must act as if the operations really happened in order, thus putting bounds on possible observable behaviour. In particular, there is no way that loads of independent variables can observe each other (what would that even mean?), thus loads can happen in any order, regardless of the kind of operations you use.
  • Enforcing specific order for loads and stores requires volatile accesses. They are entirely unrelated to atomics and multithreading. Volatile atomic accesses are a separate concept which doesn't currently exist in Rust.
  • Basically, the happens-before relationship only matters when talking about loads and stores. If a load and a store to a variable do not have a happens-before relationship in either direction, then you have a data race. But it doesn't make much sense to talk about happens-before between only loads or only stores, because it doesn't affect observable behaviour. In fact, if you're only doing loads you don't even necessarily need to use atomic accesses, e.g. there should be no problem in mixing atomic and non-atomic loads from the same variable (but of course you need to make sure that neither of those race with any writes).

That's a ridiculously overcomplicated example. The 0 20 result can already happen on x86, regardless of variables' placing in memory. The stores are independent (constants stored to independent memory locations), thus they may be freely reordered by the CPU, regardless of whatever source level tricks you do. Similarly, the loads can be freely reordered. And since the loads and stores happen in different threads, there is no happens-before relationship between them (except that a load of a variable must read some value previously stored to it), thus the operations may happen in any sequence, including 2-3-4-1.

3 Likes