Better understanding atomics

It's also worth noting that the std::atomic docs aren't the only documentation for atomics Rust provides; the API documentation in fact directly links to the nomicon's section on atomics as well. It's worth noting that the nomicon's sections on atomics is fairly short, and the nomicon's author, Gankra, has pointed to Mara Bos's O'Reilly book on low-level concurrency and said that it is essentially everything she would've wanted to say -- and that due to it being a whole book of its own, it makes sense that it's not directly present in the rust documentation or the nomicon.

Atomic synchronization is complicated and really can't be done justice in anything less than dedicated reference material just for atomics. I've not read the book I've linked here personally, but there have been plenty of testimonials that it's a very good resource for explaining and understanding atomics.


A possible solution for contradictions incompleteness(?) in regard to the documented behavior of atomics could be to simply mark std::sync::atomic documentation explicitly as non-normative. But I don't think that's a nice way to go. But it would be least effort.

Even if you might say this is stuff only experts use, I think having good and freely (and easily) accessible documentation is a good thing to have.

Also, I think there are several use cases where high(er) level code might use atomics, e.g. a Relaxed AtomicU64 counter to create unique IDs and maybe AtomicBools for signalling (though here things already get complicated because the synchronizes-with relation only goes from the Release-store to the Acquire-load, which may not be sufficient in some cases).

While I agree it's complicated, I feel like the way it's currently explained makes it unnecessarily complicated to understand. But maybe I'm still missing some things.

1 Like

Unfortunately, explaining atomics in a way that is both accurate even in edge cases and simpler than the full standardeeze (or if you're both unlucky and picky, a CPU diagram) is a difficult question, and people who put the effort into writing such learning material generally want to get paid for the work.

Now we have two hard problems :slightly_smiling_face:

Typically, the preferred answer is to provide wrappers that implement the patterns at the higher level,

for which the Rust API docs are sufficient, and for which a wrapper that doesn't expose the complexity atomic ordering is fundamentally trivial (which is probably why nobody actually provides this as a specialized type)

You don't want to use purely AtomicBool, because then the waiting thread has no way to wait other than spinning, and spinlocks are bad (same link as previous). But there's a library for that:

This allows you to do condvar/futex wait/notify on an AtomicU32 rather than a Mutex.

In general, you only have synchronization from stores to loads anyway. The other three options fundamentally don't make any sense, because they have no way to tell whether the previous operation that they'd theoretically by synchronized with happened. (fetch-op/compare-and-swap/etc style operations are logically both a load and a store.)

At least from my understanding of atomics, anything that isn't satisfied by AcqRel is in the PhD-level complexity to show that the used synchronization is actually sufficient. (That said, SeqCst is still a safer default, because it means the reordering model of atomics actually works, whereas that isn't the case for AcqRel since threads can observe synchronized events in different orders.)

Mixing Relaxed with AcqRel is probably achievable, but I'd stay away without specific reason, not the least because x86_64 concretely doesn't offer weaker memory ordering that AcqRel.


I didn't mean something like a CondVar here, but more use-cases where you want to inform a thread to exit in the next suitable moment, for example.

If I understood right, then SeqCst may be leading to a wrong perception of safety though:

So following that reasoning, the problem often isn't to use AcqRel instead of SeqCst but to use atomics in the first place (instead of mutexes).


I opened an issue #104814 in regard to the documentation of std::sync::atomic::Ordering.


Sorry if this has already been mentioned, but I found this to be a pretty interesting and helpful approach to teaching atomics:


Looking into the rendered PR for the Nomicon, I see that the issue with the release sequences (which I pointed out in issue #104814) seems to be correctly described in this part.

But even with the Nomicon being fixed by that PR (if it gets approved), the std reference still shouldn't make wrong or misleading statements.

The drawings seem to be nice too. I will certainly look further into this. Thanks for sharing.

1 Like

As a rule of thumb, there are three positions you can be in with atomics:

  1. Relaxed is good enough - you just need accesses to this variable to be atomic, and you don't need a guarantee that there is one globally visible ordering for accesses to this variable. This solves, for example, the "I need a unique u64" case, and the "I need a guarantee that multiple threads reading this bool always see either true or false, even when I have concurrent writes to the field".
  2. You can build what you need using Acquire, Release, and AcqRel, and you are completely confident that an Acquire load pairs up with a Release store to get you the semantics you need.
  3. There's a peer-reviewed paper, a PhD thesis, or other similar grade of theoretical work, describing your exact use of atomic operations, and you're only after the guarantees that the academic work says you can get.

If you're not in one of those three cases, then you should probably use a Mutex, RwLock, Barrier or a channel, and let someone else care about orderings - in particular, SeqCst is very, very rarely what you want, since the only guarantee it adds beyond AcqRel is that there exists a single total order of operations on the atomic itself that all threads agree on. Making this guarantee is expensive on most CPUs, and I've only ever seen it needed in the case of building a dynamically sized MPMC channel with a guaranteed bound on memory consumption (computed based on the speeds of the producers and consumers).

It's also worth noting that the rise in cost as you go from non-atomic through Relaxed through to SeqCst is significant. On x86, any atomic is more expensive than non-atomic, but the cost of Relaxed, Acquire and Release is all the same (because that's the x86 memory model). SeqCst then adds more cost, since you now need system-wide barriers to ensure the total ordering is met.

On ARM, even going from Relaxed to Acquire and Release needs a barrier - and going to SeqCst requires heavier barriers to ensure that the total ordering is met. As the point of using an atomic directly is to get something lighter-weight than the atomics implied by the higher-level primitives like Barrier and Mutex, it's often a bad tradeoff to go from a high-level primitive to a SeqCst atomic.


I think this might be the most common case where higher-level code would use atomics instead of mutexes.

Question: Can't I always build what I need with Acquire and Release? Afterall, mutexes can be built with those (they don't need SeqCst). And with mutexes I can ensure sequential ordering of accesses where needed.

This makes me believe that using atomics with SeqCst (instead of mutexes) indeed is an optimization for certain corner cases that are very rare.

I would say a reasonable approach in practice is this:

  • If Relaxed is sufficient, use atomics.
  • For anything else use mutexes.
  • If performance is an issue, you may optimize some problems with atomics. But in this case you need to be very careful anyway and exactly follow which guarantees are needed and which synchronizations take place.

In either case, SeqCst is never needed as sequential consistency can be emulated with mutexes, which, in turn, can be implemented with Acquire/Release ordering, right? Or do I misunderstand something?


Assuming that you don't care about anything other than correctness then Acquire and Release are all you need. There are certain very weird algorithms where the extra guarantee from SeqCst enables you to put bounds on memory consumption, CPU time, and similar non-correctness things.

The reason I say that Acquire and Release are OK if you can justify them is that they're sufficiently scary to put off people who don't know what they're doing, while still good enough to write lock-free algorithms if you know what you're doing. Because the names are a bit strange, and the description is technical and complex, people tend to shy away from them.

The trouble with SeqCst is that it's an attractive nuisance; it "looks" to the naïve eye like a magic "do what I want" ordering, but most of the time (all but one time I've seen it during code review - the other time had a reference to a peer-reviewed paper showing why SeqCst was worthwhile here) you either needed Relaxed (on the systems we were going to run that code on, cost around 10x a non-atomic access) and are paying the price of SeqCst or you're not actually getting the guarantees you expected (notably around the behaviour of non-atomic reads and writes between SeqCst reads and writes).

1 Like

One possibly important difference between Mutex and an atomic is that the former may (if you’re unlucky) block your thread while the latter will never do that. If what you want can be expressed by a single atomic operation, then your code is not only lock-free, it is wait-free (meaning: your thread is guaranteed to make progress if there is a CPU to run it). If you’re spinning on an atomic then chances are that you should either implement a slow path with kernel support, or simply use a Mutex instead — because otherwise you’re implementing a simple spin lock yourself.

One remark on x86: The only difference between a “release” and a relaxed write is that the core’s store buffer is drained for the release. This can be noticeable in workloads that are heavy on memory writes, but here we’re definitely in “thou shalt benchmark!” terrain.


One interesting and quite common case that isn't in your two first buckets is reference counting, where relaxed is usually used for cloning. From the standard library:

fn clone(&self) -> Arc<T> {
    // Using a relaxed ordering is alright here, as knowledge of the
    // original reference prevents other threads from erroneously deleting
    // the object.
    // As explained in the [Boost documentation][1], Increasing the
    // reference counter can always be done with memory_order_relaxed: New
    // references to an object can only be formed from an existing
    // reference, and passing an existing reference from one thread to
    // another must already provide any required synchronization.
    // [1]: (
    let old_size = self.inner().strong.fetch_add(1, Relaxed);

Isn't that suggesting that the access mode should be encoded in the type and not as a flag on every access functions on a universal type? The current API seems immensely accident-prone as well as unnecessarily unapproachable.

Note that it's possible to mix access modes, i.e. the same atomic can be accessed as Relaxed from one thread, for example, and as AcqRel from another.

A key point is that even if you follow the principle of not mixing AcqRel with Relaxed, this only applies within a single concurrent section. It is perfectly valid even under this practice to have a single atomic which is used Relaxed at one point, but then after a sync point (e.g. the join in a fork-join workload; the end of a rayon par_iter) gets used in a AcqRel fashion.

The Atomic* types are the building blocks, so they expose the underlying model fairly transparently.

1 Like

One example where access modes are mixed is Arc, which I posted as an example before. Reference decrement use Release on Arc.

1 Like

I wonder if the docs should/could be extended with a warning, explaining that a SeqCst atomic access isn't giving the same guarantees as accessing a variable through a mutex?

If the docs are being changed, then I think a better warning would be that if you're considering any ordering other than Relaxed, you should prefer one of the higher-level primitives in std::sync unless you're confident that you're getting your atomic handling right.

But phrased by someone who knows how to write clear warnings.

My reasoning behind this is that there are only three types of people who'll read the docs to begin with:

  1. People who don't really know what they're doing, who currently pick SeqCst because it's the "safe" option. These people need to be guided to use Relaxed or a lock instead.
  2. People porting an algorithm from an academic paper to Rust. These people are almost certainly just looking up how Rust spells memory_order_acq_rel or whatever the paper uses, and will, at most, be triggered into looking to see if the thing they're building already exists in the standard library.
  3. People who've read and understood one of the excellent books on the topic, such as Mara Bos's " Rust Atomics and Locks" which was recommended earlier in the thread, and won't care about such a warning because they know what they're doing.

I personally prefer explanations why I shouldn't or should do something, rather than "keep your hands off unless you know what you're doing". The first makes it easier to learn, the second might keep me from doing the wrong thing, but it is also discouraging from learning more (or at least not encouraging to do so).

Alternatively, a combination of both would do.

If you feel like it, maybe you like to create a PR. I'm saying this because your point regarding SeqCst really surprised me, and it might surprise other readers of the std docs as well. But maybe the problem will be solved/improved anyway with the Nomicon PR #378 (which has been indirectly linked above already).

1 Like

If I was to open a PR (I'm at the point where I consider what I said to be "obvious", which generally means I learnt it so long ago that I have no real clue how I got to the point where this is my understanding, and not that other people come to the same conclusion), I'd need significant help with the language used.

The core is that SeqCst only gives you extra guarantees on other SeqCst atomic operations. It does not give you anything extra on any other operations.

If I were to try and document this, I'd want to say a few things:

  1. Relaxed should be your "default" choice of ordering - you only need the other choices if you want accesses to this atomic by other threads to imply something about other loads and stores.
  2. If you're going beyond Relaxed, check std::sync for higher-level primitives that provide the semantics you want.
  3. SeqCst's extra guarantees over and above Acquire/Release/AcqRel only apply to SeqCst atomic accesses, and not to all other memory accesses in the program. It is thus rarely what you want, as it is expensive compared to AcqRel and the guarantees are only of use when considering the atomic operations in the program in isolation.