std::sync::atomic::Ordering
is an enumeration having five variants. Atomic::load
and Atomic::store
has the second parameter that receives Ordering
, I wonder how does the second parameter affect the operation? In other words, do different values of the second parameter affect what actual atomic instructions generate, or these values are just tags to prompt the compiler on how to process the code?
This would depend on what hardware support, isn't? RISC-V have support for all five possibilities, while x86 doesn't, e.g.
What is the difference? Isn't "different instructions are generated" the same as "the compiler translated the code differently"? I don't see a useful distinction here.
Anyway, they do lead to different instructions at the end of the day. For example, a Relaxed
load on x86 is just a normal load, whereas SeqCst
needs additional synchronization instructions.
What is the difference? Isn't "different instructions are generated" the same as "the compiler translated the code differently"? I don't see a useful distinction here.
I meant, is this argument just a tag that affects how the compiler reorders the statement, or it affects the asm
instructions on how an atomic object is set?
So, you meant, the argument of Ordering
does not only affect how the code is reordered but also affects the instructions on how to set the atomic object?
This would depend on hardware, as I have said. RISC-V, being developed after C++11, quite litreally have five different instructions (atomic instruction have bits “acquire” and “release” and there's fence
instruction for TSO
), while x86, e.g., only have lock
prefix which guarantees Ordering::SeqCst
.
...and by default provides semantics of AcqRel
(i.e. has no Relaxed
).
So, On RISC-V, the value of Ordering
affect the selection of the atomic instructions and restrict the compiler to reorder the code, right? And, on x86, it may only prohibit the compiler from reordering code and it may not affect the selection of atomic instruction since the x86 platform only supports SeqCst
, right?
Most atomic instructions require the use of lock
prefix for the correct operation on x86 thus there are only one possibility. Atomic store supports two since it can be used with out without lock
prefix.
I wonder, if the value of Ordering
depends on a value that is determined on runtime, how doe the compiler decide to reorder the code if according to the value of Ordering
?
The generated code would just include five variants and they wouldn't be merged. That's exactly what happens without optimizations, anyway. Or they may be partially merged.
IIUC, do you mean the compiler will assume the corresponding argument can be any valid value of Ordering
, and enumerate all cases for these possible values of Ordering
?
Yes. If you look on the generated unoptimized code with fixed value you'll see it's how it always works. Only with fixed order dead code elimination quickly removes useless branches which makes it possible to optimize code further.
The actual code is just a match
(https://github.com/rust-lang/rust/blob/a32978a5e8a3c34bb5caa1175f7404c2ede019b6/library/core/src/sync/atomic.rs#L3178):
match order {
Relaxed => intrinsics::atomic_load_relaxed(dst),
Acquire => intrinsics::atomic_load_acquire(dst),
SeqCst => intrinsics::atomic_load_seqcst(dst),
Release => panic!("there is no such thing as a release load"),
AcqRel => panic!("there is no such thing as an acquire-release load"),
}
that gets inlined and optimised by the compiler if a specific ordering is passed.
In practice, these all get turned into LLVM primitives that restrict the legal reordering and other optimizations, as well as the actual code emitted to the CPU.
Conceptually, these are all defined in terms of a virtual machine memory model where you have strange-seeming logic like loading with a Relaxed ordering is allowed to literally never see an update that happens in another thread, unless some other fence forces the threads to synchronize.
My understanding is these orderings were defined (as part of C++11) after a bunch of negotiation between complier and CPU vendors to be essentially the intersection of currently supported CPU synchronization primitives and desired compiler optimizations.
This is the operation to "atomic object", hence, the argument can be dynamically determined, and the corresponding instructions will be run according to the actual value of the argument on runtime, i.e. branch selection.
However, If we look at the manual of compiler_fence
, the manual says
compiler_fence
does not emit any machine code, but restricts the kinds of memory re-ordering the compiler is allowed to do.
What the compiler can do is only determined on the compile-time, which means, the value of the argument should be determined on the compile-time, however, consider this pseudo code:
compiler_fence( if random() < 100{SeqCst}else{Release});
How does the compiler know how to reorder the code according to the argument?
IIUC, the value of the argument affects both the reorder of the compiler and actual instruction that performs synchronization of the target CPU supported. I wonder if the argument can only be determined on runtime, how does the compiler decide to reorder the code according to the actual value the argument will be?
Compile doesn't “know”, compiler doesn't “think”, compiler couldn't “imagine” anything.
Compiler is dumb machine which would just embed appropriate match and then would optimize it. Nothing more, nothing less.
How is that a problem? Each of five match
branches have one, fixed, restrictions for the optimization. Thus there are no problems optimizing that code. And while the fact that this argument, which is only determined in runtime, is supposed to affect code generating which happens in compile-time may only confuse a human reader, compiler doesn't have a brain, it couldn't be confused.
IIRC it's not quite as extreme as you wrote. If I am not mistaken, it's specified that the update will be seen after a "reasonable" time. This is why it's fine to use Relaxed
for stop flags even if your code does not contain any other synchronization.
Once again: compiler couldn't “think”, it couldn't “decide”, it couldn't “become confused”.
Compiler just shuffles the code by following formal rules and is trying to improve formal metrics. And because, after inlining, nothing is determined in runtime there are no problems applying these rules and measuring these metrics.
The fact that you couldn't imagine how the whole thing should work is not an issue for compiler, compiler doesn't even try to imagine anything anyway, it doesn't have organs capable of imagining anything.