What @dodomorandi outlined is essentially correct.
The important bit to understand is that the compiler is allowed to reorder certain operations or to condense multiple instructions, or unroll loops, etc. This is, in part, how optimization works. The only rule is that it cannot change the 'end result' for some limited area of code (I'm being hand-wavy on purpose, the details are far too much).
The CPU is allowed to do the same sorts of things. What the Ordering
enum does is explain to the CPU which rules are applied to which operations. Understand that this is very difficult to wrap one's brain around. I always refer back to references before reconsidering which ordering is necessary.
-
SeqCst
requires that the CPU not perform any reordering with respect to that particular instruction and memory location - it erects a fence at that instruction, preventing it. It also ensures that all other thread's Release
-marked writes, Acquire
-marked reads, and SeqCst
marked reads or writes are visible to this operation.
-
Acquire
ensures that writes to a memory location that are semantically before the acquire-marked instruction cannot be moved after it (but it forms no barrier against reads). All Release
- or SeqCst
-marked writes from other threads will be visible.
-
Release
ensures that reads from a memory location that are semantically after the release-marked instruction cannot be moved before it (but forms no barrier against writes). It also ensures that this operation will be visible to Acquire
- and SeqCst
-marked reads.
-
AcqRel
combines Acquire and Release, which is very similar in effect to SeqCst
, but not the same. AcqRel
in effect forms one-way barriers against writes and reads - writes cannot move after the marked instruction, and reads cannot move before. It ensures that the operation has visibility of other thread's prior Release
- and SeqCst
- marked writes, and is visible to their Acquire
- and SeqCst
- marked reads
-
Relaxed
applies no rules to the operation. The only advantage of the atomic type here is that it should require that the read or write is completed atomically - it's not possible for two threads to write or read to the location at once (for values which are not loaded in a single instruction, which depends on the platform). It is possible that Relaxed
-marked operations do not see other thread's changes, or that other threads do not see Relaxed
-marked changes, until 'eventually'. It is also theoretically possible that the CPU reorder Relaxed
operations in a single thread with respect to one another.
What all this means is that the CPU is limited in how it is allowed to optimize your code to ensure that your code operates as expected. You're just providing the rules.
In general, however, for x86-based systems (including x86_64), because of the rules provided by the architecture, Acquire
is free on reads, and Release
free on writes. There is no performance penalty paid, because the architecture will behave following those rules anyway.
If there is any doubt, SeqCst
is the best bet, but in general: Acquire
for reads, Release
for writes, and AcqRel
for read-modify-write (like fetch_add
). That will nearly always handle what you need to happen.
If that was in any way unclear, refer to the links dodo provided - they're excellent! (though at least as dense as this - they're the technical references I used to check myself here)
Normally (for other languages), I'd write a disclaimer about other, non-atomic, types here, but Rust's guarantees do a pretty good job of preventing you from messing it up outside of unsafe
-land.