Better understanding atomics

alice · November 28, 2022, 7:48am

One example where access modes are mixed is Arc, which I posted as an example before. Reference decrement use Release on Arc.

jbe · November 28, 2022, 8:55am

I wonder if the docs should/could be extended with a warning, explaining that a SeqCst atomic access isn't giving the same guarantees as accessing a variable through a mutex?

farnz · November 28, 2022, 11:10am

If the docs are being changed, then I think a better warning would be that if you're considering any ordering other than Relaxed, you should prefer one of the higher-level primitives in std::sync unless you're confident that you're getting your atomic handling right.

But phrased by someone who knows how to write clear warnings.

My reasoning behind this is that there are only three types of people who'll read the docs to begin with:

People who don't really know what they're doing, who currently pick SeqCst because it's the "safe" option. These people need to be guided to use Relaxed or a lock instead.
People porting an algorithm from an academic paper to Rust. These people are almost certainly just looking up how Rust spells memory_order_acq_rel or whatever the paper uses, and will, at most, be triggered into looking to see if the thing they're building already exists in the standard library.
People who've read and understood one of the excellent books on the topic, such as Mara Bos's " Rust Atomics and Locks" which was recommended earlier in the thread, and won't care about such a warning because they know what they're doing.

jbe · November 28, 2022, 2:25pm

I personally prefer explanations why I shouldn't or should do something, rather than "keep your hands off unless you know what you're doing". The first makes it easier to learn, the second might keep me from doing the wrong thing, but it is also discouraging from learning more (or at least not encouraging to do so).

Alternatively, a combination of both would do.

If you feel like it, maybe you like to create a PR. I'm saying this because your point regarding SeqCst really surprised me, and it might surprise other readers of the std docs as well. But maybe the problem will be solved/improved anyway with the Nomicon PR #378 (which has been indirectly linked above already).

farnz · November 28, 2022, 4:50pm

If I was to open a PR (I'm at the point where I consider what I said to be "obvious", which generally means I learnt it so long ago that I have no real clue how I got to the point where this is my understanding, and not that other people come to the same conclusion), I'd need significant help with the language used.

The core is that SeqCst only gives you extra guarantees on other SeqCst atomic operations. It does not give you anything extra on any other operations.

If I were to try and document this, I'd want to say a few things:

Relaxed should be your "default" choice of ordering - you only need the other choices if you want accesses to this atomic by other threads to imply something about other loads and stores.
If you're going beyond Relaxed, check std::sync for higher-level primitives that provide the semantics you want.
SeqCst's extra guarantees over and above Acquire/Release/AcqRel only apply to SeqCst atomic accesses, and not to all other memory accesses in the program. It is thus rarely what you want, as it is expensive compared to AcqRel and the guarantees are only of use when considering the atomic operations in the program in isolation.

jbe · November 28, 2022, 8:52pm

What (I think) I have learned from this thread is that when I use a mutex, then it effectivly acts as some sort of memory barrier as in anything that has been observed by a thread A within or before one critical section where the mutex is locked will also be visible by thread B within or after another critical section where the same mutex has been locked afterwards.

This particularly also affects data that is not stored in the mutex. See also @alice's comment to my other question here.

Atomics, in contrast, will not necessarily ensure that. They may ensure it if you use Release and Acquire (or AcqRel or SeqCst), but that depends on which value has actually been read by the Acquire (i.e. if it was a value written by the Release or any value written by the release sequence).

This is the information that might be helpful for a beginner, I think.

(Not sure if I explained/reflected everything correctly.)

CAD97 · November 28, 2022, 10:53pm

Just to make it explicit: the reason that locking a mutex behaves this way is because it internally introduces a Release/Acquire edge between the two locks. Just like atomics, you only get the synchronization if the same Mutex (atomic) is used from both threads.

Rust's Mutex could theoretically only synchronize the contents, since Rust's Mutex is dataful, but Rust has opted to stick with the standard semantics where Mutex synchronization is sufficient for a synchronization barrier for all memory accesses in the thread. (Plus, there's no known hardware where the weaker guarantee would actually be beneficial.)

In C++ this is a clearer relation, since C++ only provides a "raw mutex" and the user is in charge of pairing locks around accesses to the logically protected data.

(Things get wacky when you thread state through multiple atomic locations and try to track causality through them, especially when mixing in Relaxed accesses... so, generally, don't. Stick to reasonably encapsulated chunks of synchronization if at all possible.)

rixelot · November 28, 2022, 11:07pm

I stumbled on atomics trying to share a number between tasks, thinking using a Mutex was overkill. Didn't expect to land in a booby trapped house .

This brings the question: are atomics "here be dragons" territory and should they not be used for regular application programming? The crate and type naming does not help here, as it looks like innocent types yet it's very easy to write wrong code (including stuff that works on x86 and fails elsewhere, which is a class A footgun).

CAD97 · November 28, 2022, 11:53pm

As with all things: it depends.

First of all: you're not going to run into problems with atomics unless you use unsafe. While atomics are themselves not unsafe, using them to synchronize other data is fundamentally unsafe.

I'm fairly confident to say that any use of an atomic which does not use unsafe only wants Relaxed ordering. Synchronization can only matter if you're using unsafe to pass non-Send/Sync data between threads, as anything Sync doesn't need external synchronization.

If what you want to express is expressible with just the safe API of atomics (e.g. a monotonic counter), then atomics potentially are what you want.

If you're reaching for unsafe, though, it's the same as all other times you feel the call of unsafe: you'd quite potentially be better off finding an existing safe abstraction^[1] that does what you want instead.

So in short, yeah, I'd recommend application code to use an existing safe synchronization approach rather than roll their own. For the Relaxed counter case, that would be to instead of everyone incrementing the same counter, use a fork-join structure where each worker thread makes keeps its own counter and you aggregate them when joining the results together.

And I say this as someone perhaps too confident in reaching for unsafe to bludgeon exotic patterns into Rust which can't be expressed with a pure safe library API. ↩︎

newpavlov · November 29, 2022, 12:59am

I also recommend it. Personally, the reordering barrier explanation was easier for me to understand than the diagrams in Rewrite atomics section by SabrinaJewson · Pull Request #378 · rust-lang/nomicon · GitHub.

Counter example: lightweight atomics-based ring buffer with atomic elements. Using only Relaxed operations can cause the following race condition: you read an updated cursor, but stale data inside buffer cells. It's not a memory safety issue, but still a very serious and hard to debug logical error.

jbe · November 29, 2022, 9:58am

Yes, but the mutex ensures that there is always a Release/Acquire edge, independently of which critical section (of the same mutex) runs first. This is what @alice said here:

In contrast, AcqRel and also SeqCst(!) accesses will only cause the write to "synchronize with" the read that obtains the written value. An operation that reads will not "synchronize with" a successive write in another thread. This is where mutexes (where both a read and a write have to be accompanied with a Acquire/Release sequence) differ from a simple atomic access.

(Note that the memory ordering page on cppreference.com has been updated to include a definition of "synchronize-with" now.)

Yes, that's what I wondered about in this thread: Can I use a Mutex<()> to avoid conflicts when accessing an UnsafeCell?

I don't doubt that this is the case, but is this documented somewhere? (i.e. that a Mutex uses Acquire/Release access which guarantee this synchronization barrier, and doesn't do some weird optimization for some hypothetical hardware as outlined here?)

I'm just asking this in regard to proper documentation. I understand this would likely never be implemented by Rust's std in practice, but lack of documentation caused me to ask the question (can I use a Mutex<()> …) because it wasn't clear to me.

farnz · November 29, 2022, 6:03pm

It's more subtle and complex than that. Writes in the thread that performs a Release, AcqRel or SeqCst write (a Release store) to a given atomic and that occur in thread order before the Release store happen-before any reads in another thread that occur after an Acquire, AcqRel or SeqCst read (an Acquire load) of the same atomic in thread order.

As a side note, this doesn't guarantee that the writes done before a Release store happen-before the Release store; a Relaxed load of the atomic is permitted to observe the Release store but not the writes that came before the Release store. It's the Acquire load that has the attached guarantees, which are conditional on a previous Release store to the same atomic, and I have worked on systems where this particular difference can be observed in practice.

Mutexes work because locking the mutex requires an Acquire read of the lock state, and unlocking the mutex requires a Release store to the lock state. These two operations are to the same atomic, and thus synchronize against each other; the requirement that all stores in a thread before a Release store are visible to the thread that performs the Acquire load does the rest.

jbe · November 29, 2022, 6:34pm

But was my statement regarding the "synchronize with" relation wrong? In that post, I did (want to) assert that both the write and the read are performed with Release/Acquire. I didn't make that clear enough perhaps.

Yes, if the load is Relaxed, then the Release write will not "synchronize with" the Relaxed read.

farnz · December 1, 2022, 1:24pm

This is why writing about this turns into books - getting the language right is incredibly challenging, and I salute your efforts to try and find a succinct and clear way to explain this in documentation. I will be extremely impressed if you can get to something simple and clear without ending up with a book.

The atomic operation itself does synchronize with all other atomic operations on the same atomic - a Relaxed read is synchronized with a Release store so that you either see the Release or you don't, and you can never see "part of" the Release store. What it doesn't do is synchronize other memory operations outside the atomic.

As a model (not how it actually works on any platform, and missing the extra guarantee of SeqCst, but a reasonable model for thinking about how the memory ordering stuff works in abstract):

Memory is split into locations, which combine both the backing bytes in RAM with knowledge of the access size (32 bit, 64 bit etc) and some metadata per location.
There is a global store.
- The global store has every possible location in the system, and a full set of bytes of RAM.
- The global store has a lock you can take for each location; this allows you to modify locations (including the metadata component) in a race-free manner assuming you take the lock.
  - Rust types would look like:
    - struct Location { offset: usize, size: usize }
    - struct Metadata { pending_writes: HashMap<usize, u8>, ... }
    - struct GlobalStore { bytes: Vec<u8>, metadata: BTreeMap<Location, Mutex<Metadata>> }
All threads have their own private store, which tracks data written by the thread
- This just has the bytes of memory, no locations.
  - Rust type would look like struct LocalStore { local_writes: HashMap<usize, u8> }
- Reads in a thread first check the private store to see if it has bytes for that location, and use the value in the private store if there is one.
- If there is no value in the private store, the read reads the bytes (only) from the global store.
- If there is a partial value in the private store (say 2 of the 8 bytes in a u64, use the private store bytes in preference to the global store bytes).
- Writes go to the bytes in the private store, never to the global store.
There is a background task that can at any time suspend a thread to do the write back process
- Write back and thread execution can't run at the same time.
- Write back can run repeatedly before the thread is allowed to resume normal execution.
- The thread has no control over when it's suspended and write back takes over.
- Each time write back runs, it does the following:
  - Choose one "dirty" byte in the thread private store.
  - Read the value from the thread private store, and racily write it to the global store.
  - Delete the private store copy of that byte.
An atomic operation does the following:
- Take the global store lock for the atomic location.
- If the operation has Acquire semantics (it includes a read, and the ordering is one of Acquire, AcqRel, or SeqCst), copy the writes from the location's metadata to this thread's private store, flagging them .
- If the operation includes a read component, read the value from the global store.
- If there are compares in the atomic operation, do the compares.
- If the compares say to write, or there are no compares, do the requested write to the global store.
- If this is a Release semantics write (Release, AcqRel, SeqCst), extend the location's metadata with a copy of this thread's private store.
- Release the lock.

This is already a complex model, and it's incomplete - it just covers the synchronization present in Relaxed, Acquire and Release, ignoring SeqCst's extra guarantees.

But it should be clear why talking about "synchronized with" is difficult; all atomic operations include some degree of implied synchronization, and then there's more synchronization if you're using Acquire and Release orderings correctly. There's even more synchronization lurking under the hood if you use SeqCst, too.

jbe · December 2, 2022, 9:13am

When I spoke of "synchronizing with", I meant the "synchronized-with" relation from the standard.

I don't think that is right. (edit: in regard to the terminology used by the C++20 standard)

I don't think that a Relaxed read is synchronized with a Release store, at least not in regard to the terminology "synchronize-with" from the standard.

This may be a bit confusing as the terminology "synchronization" or "synchronize" is also used in a different context. I don't think the C++20 standard does (though I'm not entirely sure), but the Rust documentation on std::sync::atomic::Ordering does:

Memory orderings specify the way atomic operations synchronize memory. In its weakest Ordering::Relaxed, only the memory directly touched by the operation is synchronized.

It doesn't speak of "synchronized-with", but of "synchronizing" the memory directly touched by the operation. So the terminology is somewhat mixed here. This doesn't help to make things easier.

jbe · December 2, 2022, 9:47am

I personally prefer the formal approach using

"sequenced before"
"modification order"
"release sequence"
"synchronizes with"
"happens before"
…

in particular because I feel like defining synchronization and locks using (a simplified model of) Rust and/or locks feels somewhat circular.

But I see how things get complex fast, and I also understand if some people prefer such models over a formal description. I guess it's a matter of taste. To me, the formal model seems easier to grasp (at least if all necessary definitions are included, which wasn't the case until these changes have been made in the cppreference.com wiki).

farnz · December 2, 2022, 10:44am

The C++20 terminology is a bit confusing, too - which is part of why the "consume" ordering is such a disaster.

The synchronized-with relation is not really about synchronization at all - it's trying to identify which Acquire loads have special cases with Release stores, and which don't.

Underlying this is a formal model, which isn't part of the C++20 standard because the formal models for distributed consistency are still an active area of research. The formal literature tends to talk in terms of "valid executions" of code, rather than how the atomics interrelate, and there's at least two common ways to describe the valid executions:

Litmus tests. Write out a simple set of threads, and describe all valid executions. This is especially useful when you're thinking in terms of "does my implementation comply with the memory ordering model" - you can run the litmus tests, and look to see if you can get an invalid execution order.
Formal logic axioms: using a pre-existing formal logic notation (e.g. HOL, or Isabelle, or Coq), write out statements that must hold for your memory model.

The C++20 stuff is a nasty intermediate - it takes the formal logic axioms, and tries to translate them from the original formal logic to standards-body English.

farnz · January 9, 2023, 11:42am

For future readers of this thread, Mara Bos has made her book on Rust atomics available to read online for free. This is one of the best resources you'll find on Rust atomics (and possibly C++ atomics, too!)

jbe · January 9, 2023, 12:22pm

Mara Bos notes (like previously discussed in this thread) that SeqCst is most often not what you need, and she argues that SeqCst is not a great default and even says "it is advisable to see SeqCst as a warning sign" (see Common misconceptions).

Thanks for letting us know! (And thanks to her for making it available!)

whose API abstracts away the fact that atomics were used ↩︎

system · April 9, 2023, 12:22pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Atomics and save Ordering help	5	365	November 17, 2021
Ordering::Relaxed vs Acquire+Release? help	9	1150	February 12, 2022
[Review Request] Atomics usage in getrandom help	6	443	August 12, 2019
Questions about atomic ordering help	6	740	April 1, 2021
Atomic ordering and memory fence help	4	362	February 3, 2022

Better understanding atomics

Related Topics