Better understanding atomics

CAD97 · November 22, 2022, 7:11pm

Just as an initial note -- on x86_64 processors, using SeqCst is going to very rarely actually be any different than AcqRel, as x86_64 processors have strong ordering guarantees that (at the current time, anyway) cannot be relaxed, so the difference only makes an impact to the compiler. With correct usage of AcqRel, the impact on what freedom the compiler has is small as well.

If you are replacing Mutex<usize> with AtomicUsize and you are not using compare_exchange loops (fetch_update), then using the atomic will generally perform marginally better. However, any improvement will likely be marginal, even with Relaxed, as your "critical section" where you hold onto the lock is small enough that the chance of one thread locking the mutex while another thread holds the lock is very low.

If you require using fetch_update, it becomes much more of a toss-up, and depends a lot on the use case. If the update is "fast," then an atomic (even with SeqCst) would likely be marginally better. If the update is "slow" enough that multiple updates "race" with each other, using a Mutex will typically perform better, especially under high load, because in general spinning isn't great.

If you're using the Mutex to guard more than just what can be expressed solely by Atomic's API surface, you'll be hard pressed to beat using a Mutex no matter what ordering you use. What's much more important is structuring your cross-thread communication to minimize lock contention, and comparatively it doesn't matter whether the communication channels use mutual exclusion or atomics to communicate.

jbe · November 22, 2022, 10:26pm

Thanks for all the posts, so far. I haven't read everything yet. I finished the most part of the video, and I'm currently going through the C++ reference. However, I have a hard time because it uses terms where I don't know exactly what they mean or if they are strictly the same in C++ and in Rust. Like:

evaluations
side effects
value computation
evaluation order (okay, I guess this could be deduced from the Rust reference)
"synchronizing with" (as in "synchronizes-with" in that spec)

I have an idea what these terms mean, but for fully understanding all implications, I'd prefer to have a definition of these somewhere. It's really hard for me to read. For example:

Read-read coherence: if a value computation A of some atomic M (a read) happens-before a value computation B on M, and if the value of A comes from a write X on M, then the value of B is either the value stored by X, or the value stored by a side effect Y on M that appears later than X in the modification order of M.

When I don't know what "side effect" means in that context (or more precisely, in the context of Rust), how am I supposed to understand this?

Edit: Ah, I see that "side effect" and "evaluation" seems to be always related to "expressions", so it's basically side effects from evaluating the expression or the computation result of the expression.

Also, what's the difference between "value computation" and "evaluation"? Is that the same? I have no clue (yet).

~~Another issue: Which version of C++ are we talking about in Rust? Do I skip the sections saying "until C++20" or do I skip the sections saying "since C++20"?~~ Edit: Found an answer to that question, as in the Rust docs it specifically says:

Rust atomics currently follow the same rules as C++20 atomics, […]

The piece that I can't really solve is where "synchronizes-with" is defined.

Overall, the specification doesn't seem to be well-defined to me as it uses terms which it doesn't define. Or I just miss the needed links/references to understand it. Or I missed definitions while I read that reference. But trying to find the definitions with searching within the document didn't help me. Can anyone tell me where "synchronizes-with" is defined?

Edit: Ah, this might help, of course:

Maybe after reading that, I will know where to look.

jbe · November 23, 2022, 3:07am

That linked example talks about the C11 memory model. Is that what was used by Rust before?

Translating the example where SeqCst is needed to Rust, using mutexes, it would look like this:

use std::sync::Mutex;
use std::thread::spawn;

fn main() {
    let a = Box::leak(Box::new(Mutex::new(false)));
    let b = Box::leak(Box::new(Mutex::new(false)));
    let c = Box::leak(Box::new(Mutex::new(0)));
    let t1 = spawn(|| {
        *a.lock().unwrap() = true
    });
    let t2 = spawn(|| {
        *b.lock().unwrap() = true
    });
    let t3 = spawn(|| {
        while !*a.lock().unwrap() {}
        if *b.lock().unwrap() {
            *c.lock().unwrap() += 1;
        }
    });
    let t4 = spawn(|| {
        while !*b.lock().unwrap() {}
        if *a.lock().unwrap() {
            *c.lock().unwrap() += 1;
        }
    });
    t1.join().unwrap();
    t2.join().unwrap();
    t3.join().unwrap();
    t4.join().unwrap();
    assert_ne!(*c.lock().unwrap(), 0); // could this panic?
}

(Playground)

Could the assert panic in this case?

xfix · November 23, 2022, 9:44am

I will disagree with you here. Chances are that if Acquire and Release are insufficient then the code is simply wrong and tries to do stuff that simply isn't possible with atomics.

SeqCst guarantees much less than people would expect it to, essentially it guarantees that all threads will agree on a single total order for all SeqCst writes as long they read with SeqCst - that's it. In particular, what it doesn't do is giving loads Release semantics or writes Acquire semantics. It's a very niche guarantee that is mostly unnecessary, and a code that relies on it is highly suspicious.

My recommendation would be to use Mutex instead if Acquire and Release are insufficient. Mutexes don't allocate anymore, and when uncontended, all locking does is compare_exchange(0, 1, Acquire, Relaxed).

jbe · November 23, 2022, 10:08am

If I understand right, this "niche guarantee" isn't given by mutexes either? (As in the assert_ne! in the Playground above may fail?)

jbe · November 23, 2022, 10:15am

The C++11 standard, where "synchronizes-with" is (apparently) defined is "available for a fee".

(Source Wikipedia)

Same for C++20?

alice · November 23, 2022, 10:23am

I agree with this one.

No, but it does give loads acquire semantics and writes release semantics. It doesn't make sense to give a load release semantics.

Mutexes aren't sequentially consistent, but acquire/release is enough to ensure that your playground will not fail:

Let's say that if *b.lock() is false. Then, we have:

*a.lock().unwrap() = true happens-before
last unlock of while !*a.lock().unwrap() happens-before
unlock of if *b.lock().unwrap() happens-before
*b.lock().unwrap() = true happens-before
last unlock of while !*b.lock().unwrap() happens-before
if *a.lock().unwrap()

Thus, if if *b.lock() is false, then if *a.lock() will be true under acquire-release.

jbe · November 23, 2022, 10:36am

I already have difficulties to follow this first step.

These two operations are in different threads. if *b.lock() is in t3 while *a.lock().unwrap() = true is in t1.

According to the C++ reference, an evaluation A happens before an evaluation B if:

A is sequenced-before B
A inter-thread happens before B

If I understand right (I'm not even sure of that), then "sequenced-before" is something that is only happening within a single thread. So it cannot be true here? So this means that *a.lock().unwrap() = true "inter-thread happens before" if *b.lock().

Let's look further into this:

Evaluation A inter-thread happens before evaluation B if any of the following is true

A synchronizes-with B

A is dependency-ordered before B

A synchronizes-with some evaluation X, and X is sequenced-before B

A is sequenced-before some evaluation X, and X inter-thread happens-before B

A inter-thread happens-before some evaluation X, and X inter-thread happens-before B

I don't see how A synchronizes-with B (as both work on different values). Dependency ordering only seems to be relevant with the "consume" ordering, if I understood it right.

So what's going on here? And how can you deduce that *a.lock().unwrap() = true happens before if *b.lock()?

alice · November 23, 2022, 10:38am

Well, *a.lock() = true happens-before the last loop of while !*a.lock() because *a.lock() returns true in that iteration. And the last iteration of while !*a.lock() happens-before if *b.lock() because it is sequenced-before it on the same thread.

Note that any time you have lock1(A), unlock1(A) on thread 1 and lock2(A), unlock2(A) on thread 2, with A being the same mutex on both threads, then you must either have unlock1 synchronizes-with lock2, or you must have unlock2 synchronizes-with lock1. (I called it happens-before in my previous comment.)

jbe · November 23, 2022, 10:54am

Okay, I'm trying to understand yet what's the difference between the mutex example I gave in the Playground, and the last example in the C++ reference here, which uses atomics and requires SeqCst.

Maybe this makes me finally understand the difference:

I guess the difference is that when I use locks, I have first Acquire and then Release on both threads, while if I use atomics (like in the last example in the linked C++ reference), one thread uses Aquire and the other uses Release.

alice · November 23, 2022, 11:01am

Yes. The step that doesn't translate to atomic operations is this one:

If this was an atomic load followed by an atomic store, then the load would be acquire and the store would be a release. However, atomics only impose synchronizes-with from a release to an acquire operation.

jdahlstrom · November 23, 2022, 11:13am

ISO charges money for the final authoritative versions of their standards, but us peasants can peruse drafts that are available for free. The closest draft to the published C++20 standard is N4861 I believe. Any differences between the draft and the final document should be purely editorial.

alice · November 23, 2022, 11:34am

So from that draft, this is the one relevant to atomics:

An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

So basically a release operation synchronizes with an acquire operation whenever the release operations is before the acquire operation in the total order for the atomic.

jbe · November 23, 2022, 1:33pm

I would like to summarize the relevant rules for Rust, to see if I understand them correctly now.

I'll start with which parts here are relevant and which are not for Rust:

I will only care about the "formal description" here.

Sequenced-before

This is a relationship that only makes sense to look at in one thread (and not across thread boundaries), and it's determined according to the evaluation order.

E.g. when I write:

*a = 1;
*b = 2;

Then *a = 1 is sequenced-before *b = 1.

Carries dependency

Irrelevant to Rust as Rust doesn't support the Consume ordering.

Modification order

This applies to each single atomic variable and describes in which order the variable is changed to which value. When looking at one particular atomic variable, the order is a total order that is equal for all threads.

However, I cannot determine a modification order involving two different atomic variables. It only applies to one atomic variable each.

Release sequence

The relevant part from the C++ reference is this:

After a release operation A is performed on an atomic object M, the longest continuous subsequence of the modification order of M that consists of

~~Writes performed by the same thread that performed A~~ (until C++20)

Atomic read-modify-write operations made to M by any thread

is known as release sequence headed by A.

Basically it means that if I do a release operation on an atomic M which is followed by several read-modify-write operations (such as fetch_add for example), then these read-modify-write operations belong to the release sequence (until some other modification (i.e. write) operation happens (such as a plain store).

Dependency-ordered before

Again, irrelevant for Rust as of now due to lack of Consume ordering.

Inter-thread happens-before

This is a relation of operations running on different threads. Relevant here is:

Between threads, evaluation A inter-thread happens before evaluation B if any of the following is true

A synchronizes-with B
~~A is dependency-ordered before B~~ (irrelevant for Rust)
A synchronizes-with some evaluation X, and X is sequenced-before B
A is sequenced-before some evaluation X, and X inter-thread happens-before B
A inter-thread happens-before some evaluation X, and X inter-thread happens-before B

The subset of happens-before (see below) which looks at two operations A and B which are on different threads.

Happens-before

I would say this is the transitive closure of sequenced-before (on same thread) and synchronizes-with (on different threads) relationships.

Where two operations A and B are on the same thread, we can also speak of sequenced-before, and where A and B are on a different thread, we can also speak of inter-thread happens-before.

Happens-before is the union of both sequenced-before and inter-thread happens-before.

Simply happens-befoere

Irrelevant for Rust as it will be the same as happens-before due to lack of Consume ordering.

Strongly happens-before

Same as happens-before but includes the SeqCst case:

A synchronizes with B, and both A and B are sequentially consistent atomic operations

And the elements required to form the transitive closure:

A strongly happens-before X, and X strongly happens-before B

Synchronizes with

This basically means an Acquire operation read a value from a previous Release operation or an operation in the release sequence, i.e. the release operation or any contiguously following read-modify-write operations in the modification order.

Edit: The operation that heads the release sequence, i.e. the initial Release operation then synchronizes-with the Acquire operation.

Not sure if this summary is correct. Maybe I wasn't very precise, but I feel like I have a basic understanding now on what's going on and how these things work.

I find the documentation very difficult to read though, mostly because the synchronizes-with isn't explained at all in the C++ std::memory_order reference (only in the standard or one of its drafts), and because the C++ reference is also including things which are not relevant for Rust and make things harder to understand.

Perhaps someone can give me feedback if my reasoning regarding the transitive closure is correct.

jbe · November 23, 2022, 2:10pm

I would like to get to a point where I think the C++ standard and the Rust documentation differ:

In the C++ standard we have (supposedly, as it's a draft):

If B performs an acquire operation and takes a value not from the release operation A itself but from a later read-modify-write operation that's still part of the release sequence headed by A, then A synchronizes with B.

In the Rust documentation, however, we find:

Acquire: When coupled with a load, if the loaded value was written by a store operation with Release (or stronger) ordering, then all subsequent operations become ordered after that store.

Note there is no mentioning of any "release sequence". Now what does that mean?

Suppose one thread does:

a.store(10, Release); // operation 1
a.fetch_add(1, Relaxed); // operation 2

A second thread does:

a.load(Acquire); // operation 3

If the second thread loads the value 11, then according to Rust's documentation, there is no synchronization between operation 3 and the other two operations at all. But according to the C++ reference, operation 1 would synchronize-with operation 3, even if the second thread loads the value 11 and not the value 10.

Am I correct here? And if yes, does this mean the Rust documentation is incomplete here?

alice · November 23, 2022, 2:45pm

The pdf linked ealier contains the following very interesting note about a change that happened in C++20:

Affected subclause: 6.9.2.1
Change: Except for the initial release operation, a release sequence consists solely of atomic read-modifywrite operations.
Rationale: Removal of rarely used and confusing feature.
Effect on original feature: If a memory_order_release atomic store is followed by a memory_order_relaxed store to the same variable by the same thread, then reading the latter value with a memory_order_acquire load no longer provides any “happens before” guarantees, even in the absence of intervening stores by another thread.

jbe · November 23, 2022, 2:51pm

Yes, but a.fetch_add(1, Relaxed); is a read-modify-write operation.

alice · November 23, 2022, 2:52pm

Yes, I know. I wasn't saying that you were wrong. It was just an observation that I didn't know of myself.

It does seem like you are correct that the Rust documentation is not complete.

jbe · November 23, 2022, 2:52pm

Ah okay, I thought it was an immediate response to my post, sorry.

jbe · November 23, 2022, 3:00pm

I think this is also reflected here:

Release sequence

[…]

Writes performed by the same thread that performed A (until C++20)

[…]

Topic		Replies	Views
Atomics and save Ordering help	5	367	November 17, 2021
Ordering::Relaxed vs Acquire+Release? help	9	1160	February 12, 2022
Like this Understand Rust memory Ordering is correctly?	5	223	April 21, 2024
[Review Request] Atomics usage in getrandom help	6	444	August 12, 2019
Questions about atomic ordering help	6	741	April 1, 2021

Better understanding atomics

Sequenced-before

Carries dependency

Modification order

Release sequence

Dependency-ordered before

Inter-thread happens-before

Happens-before

Simply happens-befoere

Strongly happens-before

Synchronizes with

Release sequence

Related Topics