Like this Understand Rust memory Ordering is correctly?

We know that Rust has 5 kinds of memory ordering. But I've been struggling to understand them for a few days, and I found that some things are difficult to test. So I came to ask, to ensure that my understanding is correct.

Consider the following code:

use std::sync::atomic::{AtomicBool, Ordering};

fn main() {
    let flag = AtomicBool::new(false);

    let x = 1;  //1
    let y = 2;  //2
    let z = 3;  //3

    flag.store(true, Ordering::Release);  //f

    let a = 4; //4
    let b = 5; //5
    let c = 6; //6

    println!("x: {}, y: {}, z: {}, a: {}, b: {}, c: {}", x, y, z, a, b, c);
}

I currently understand that Release guarantees that when this line of code is executed, the code above it in the same thread will definitely be executed, and the results will definitely be in memory, not in registers. For example, let x = 1; will definitely be executed and the result will be in memory when the f line is executed.

The question is, Release only guarantees that when f is executed, 1, 2, 3 will definitely be executed and the data will be in memory. This means that the order of execution of 1, 2, 3 (the commented code) is not guaranteed. For example, the compiler may execute 2, 3, 1 (the commented code) in that order, just because it may run faster. Or it may execute 3, 2, 1 (the commented code) in that order, just because it may run faster.
However, no matter how it is executed, when the f line is executed, 1, 2, 3 (the commented code) will definitely be executed, and the data will be in memory, so that when reading x, y, z later, the values of 1, 2, 3 will definitely be obtained, instead of 1,2,3 value is in registers, and the data in memory is still 0.

So Release only guarantees that when f is executed, the commented code 1, 2, 3 will definitely be executed, but the order of execution between them is not guaranteed.

There is also the possibility that the code below Release, 4, 5, 6 (the commented code), Release only guarantees that when the f line is executed, 1, 2, 3 (the commented code) will definitely be executed. It doesn't say how the code below f will be executed. So, there is also the possibility that 4, 5, 6 (the commented code) will be moved above the f line. Even if 4, 5, 6 (the commented code) are all moved above the f line, it should not violate the Release rule, because after moving above, when the f line is executed, 1, 2, 3 will still be executed. The difference is that 4, 5, 6 (the commented code) will also be executed, because 4, 5, 6 (the commented code) may be moved above the f line.

There is also the possibility that 4, 5, 6 (the commented code) are not moved above the f line, but after executing the f line, the order of execution of 4, 5, 6 (the commented code) is also uncertain. The possible execution orders are:
4, 5, 6.
4, 6, 5.
6, 5, 4.
All three possibilities exist.

This is also the case where I have exhausted all possible situations. And these memory ordering constraints are within a single thread, it's not possible to move a statement from one thread to another.

So, for Acquire and Release, it should be the same.
Is that right? If my understanding is correct.

Then Release should be in a single thread, when the Release line is executed, the code above Release will definitely be executed, but the order of execution between them is not guaranteed. The code below Release may be reordered, not only the order of execution between them, but also the possibility that the entire code is moved above the Release line.

And Acquire is the opposite of Release. Acquire only guarantees that when the Acquire line is executed, the code below it will definitely not be executed yet. It's just that the code below Acquire cannot be moved above the Acquire line. And the order of execution between the code below Acquire is not guaranteed. It's just that when the Acquire line is executed, the code below it will definitely not be executed yet. After executing the Acquire line, the code below it will have the opportunity to execute.

Is that right?

Here's the translation:

There's another point to note: memory ordering alone cannot provide synchronization functionality between multiple threads. In other words, if you want to implement synchronization between multiple threads, you need to use additional mechanisms.

For example, consider the following code:

use std::sync::atomic::{AtomicBool, Ordering};
use std::thread;

fn main() {
    let flag = AtomicBool::new(false);

    let t = thread::spawn(move || {
        // Thread 2: wait for flag to be set
        while!flag.load(Ordering::Acquire) {
            // spin
        }
        println!("Thread 2: flag is set!");
    });

    // Thread 1: set flag
    flag.store(true, Ordering::Release);
    println!("Thread 1: flag is set!");

    t.join().unwrap();
}

Here, the main thread executes the flag.store code, and then the t thread can execute the flag.load code. However, this is not because of the Release and Acquire memory ordering constraints, but rather because of the conditional statement that controls the synchronization between the two threads. It's just that the Release and Acquire functionality is being leveraged to achieve this synchronization.

In other words, if you remove the while loop and simply use flag.load(Ordering::Acquire), the two threads would not be synchronized at all.

Take this with a grain of salt, but this is my understanding of atomic memory ordering in Rust.

In the first code sample, it is impossible for any other thread to observe the state of local variables x, y, and z (as well as a, b, and c, of course). The compiler is free to optimize these away; they may never be committed to memory at all.

Memory ordering (and memory barriers) are for providing information to the compiler and the hardware that is needed for correct concurrent access to memory. This implies two things:

  1. The memory must be shared (e.g. through references).
  2. There must be concurrent processes (e.g. threads) accessing the shared memory.

Without both properties, there is no way to observe the side-effects, so they can be safely eliminated. [1]

Other than that very important detail, your understanding of the operation of Release appears to be mostly correct. Another missing detail is that Release applies to stores and Acquire applies to loads. Be mindful that the memory ordering constraints say nothing about whether code can or cannot be executed. It only controls when memory is accessed.

FWIW this is one of the best resources on the topic that I am aware of: Acquire and Release Semantics (preshing.com). And directly related to Rust there is Rust Atomics and Locks by Mara Bos (chapter 3 covers memory ordering).

Secondly, the hardware itself has cache coherency to deal with on multi-core systems, and memory ordering allows independent CPUs to synchronize their caches for shared memory accesses. [2]

For your second code sample, the important property of note is not so much that there is a loop (control flow) but that the load inside the loop synchronizes with the store in the other thread. If this code was rewritten with a normal load instead of an atomic load, there would be no guarantee that the loop ever exits, because it cannot possibly synchronize with the other thread. The compiler would be allowed to assume that there are no observable side-effects, and the loop could be optimized to remove the load.


  1. There is a different property of memory access called volatile, which is not for concurrency. It is for memory-mapped I/O. The hardware itself needs to observe these memory accesses, not a separate thread. ↩︎

  2. This is usually (but not always) what people think of as atomics having performance overhead. In truth, it is only the write accesses that can have performance implications, and only when sharing actually occurs (contention). Reads are always "free" once they are in the cache. ↩︎

4 Likes

Ah, I know, maybe it's because I wrote those variables x, y, z, a, b, and c as local variables. They should be global variables, which means they should be decorated with static mut. And that's not enough, they must be competed by at least two threads, including two, so that we can see the so-called side effects. Is like that ? Just like my previous code, if those variables are local variables, then they can be safely eliminated, what is eliminated mean ?

And another point. You said that Release applies to stores and Acquire applies to loads.

Let's look at the modified code:

use std::sync::atomic::{AtomicBool, Ordering};
static mut x=1;
static mut y=1;
static mut z=1;
static mut a=4;
static mut b=5;
static mut c=6;
fn main() {
    let flag = AtomicBool::new(false);
    unsafe{
        x = 1;  //1
        y = 2;  //2
        z = 3;  //3
    }
    flag.store(true, Ordering::Release);  //f
    unsafe {
        a = 4; //4
        b = 5; //5
        c = 6; //6
    }
    println!("x: {}, y: {}, z: {}, a: {}, b: {}, c: {}", x, y, z, a, b, c);
}

Okay, I don't want to write another thread code to compete. Here is a single thread, I just don't want the code to become too complicated.

Your previous sentence means that the f line of code uses Release, Release applies to stores, and the code above the f line, such as 4, 5, 6,(Comment line code ) are store operations because they are assignments. So when the f line of code is executed, the 4, 5, and 6 lines of code have already been executed. But if we add a new line of code, the context of which is as follows:

unsafe{
    x = 1;  //1
    y = 2;  //2
    z = 3;  //3
    println!{"{}",x}; // p
}

Then in this case, when the f line of code is executed, we can determine that the 4, 5, and 6 lines of code have already been executed, and the corresponding values have been written back to memory, right? This time I have shared variables, oh! there exists a possibility of such a situation: .
For the new println line of code, it's possible that the f line of code has been executed, but the println line of code has not been executed yet. Because Release guarantees that the previous store operations have been executed, but here println is a load operation, it is not store opeation, so Release cannot guarantee it to executed before the f (f is line which is comented), and it's up to the compiler and CPU to decide whether the new println line of code is executed before or after the f line,If the compiler and CPU may think that doing so will result in faster speed.Acquire should be the same, right?

you pushed article which the previous article, there is such a sentence:

Value = x;                         // Publish some data
STORESTORE_FENCE();
IsPublished = 1;                   // Set shared flag to indicate availability of data

STORESTORE_FENCE() mean that
Value = x must be executed first, and then IsPublished = 1 is executed. Is that right.and forbid the compile and cpu to reorder their order.

I'm not skilled enough to answer but Herb Sutter's YouTube videos on the C++ memory model really improved my understanding of atomic operations. I highly recommend them.

Yeah, writing code for a single thread is plenty to make the point. There doesn't need to be the presence of another thread, just the potential for one. We can assume that all Rust programs have multiple threads, because the language itself is designed with the hypothesis that every program wants to take advantage of multiple threads. So, the potential for the presence of another thread accessing the shared memory is already there.

This is why you are not allowed to instantiate a static variable with a type that is !Sync, for example. The multithread hypothesis is deeply rooted into the language.

For clarity, the Release "only applies" to the store with the Release ordering. It applies to the atomic flag. This flag is now a marker for a memory barrier that prevents reordering memory accesses to after the Release store. It is the memory barrier that "applies to" the other shared memory accesses in relation to the Release store.

The sentence you quoted just says that an "Acquire store" doesn't do anything in relation to a "Release load". The relationship is established between an "Acquire load" and a "Release store". If that helps [1].

I'll refer you to the "Release and Acquire semantics" link I provided earlier, and here I'll inline one of the diagrams it provides (note: "read == load" and "write == store"):

image

This is still mixing up the atomic stores with other shared memory accesses. The memory load for x in the println in that case would happen before the Release atomic store. I think the function calls hidden inside the println macro also must be guaranteed to happen before the Release store as well. Otherwise, their side-effects would be unobservable to other threads!

That sounds like a correct interpretation to me!


  1. I do not know the semantics of Release with a load or Acquire with a store. The learning materials I have found just do not discuss them. And I don't want to try to understand what the C++ memory model specification has to say about it. ↩︎

1 Like

"I have thought about something. For example, in a single thread, after compiling with the release flag, I think the reordering of memory instructions is conditional, not unconditional, for the sake of execution speed. There are several conditions for this.

Firstly, there is a dependency between instructions. If instruction a is executed before instruction b, and instruction b depends on the result of instruction a, then it is not possible to execute instruction b first and then instruction a. For example:

let a = 1; // 1
a += 1; // 2
let b = a; // 3

These three lines of code in a single thread have a unique order of execution, which cannot be swapped.

However, if the code is like this:

let a = 1; // 1
let b = 2; // 2
let c = 3; // 3

These three statements have no dependencies between them, so the order of execution is arbitrary, and it is possible to execute them in any order.

For example, in the following code:

let a = 1; // 1
let b = a; // 2

Instruction 2 depends on the result of instruction 1, so it is not possible to execute instruction 2 first and then instruction 1.

Now, let's look at the code with release memory ordering:

let mut a = 1; // 1
a += 1; // 2
let b = a; // 3
c.store(Release, ...); // The code here is not correct, just to illustrate the use of Release

Here, the Release ordering ensures that the write operation is completed before the read operation. It is clear that the read operation also needs to be ordered.

Instruction 3 depends on the result of instruction 2, which depends on the result of instruction 1. Therefore, the order of execution is 1->2->3, corresponding to read->write->write operations.

So, Release does not mean that read operations can be arbitrarily reordered. Or, to put it another way, read operations exist in the order of the code, and are completed before the Release write operation. For example, the read operation of instruction 1 must be completed before the write operation of instruction 2.

Secondly, the reordering of memory operations must not affect the overall result of the code. That is, in a single thread, the result of executing the code in the original order must be the same as the result of executing the reordered code. For example, if a thread runs to completion and the result of variable a is 3 and the result of variable b is 4, then the reordered code must also produce the same result, with a being 3 and b being 4.

Acquire should also be similar.

But I wonder, if a thread has 100 write operations, and these operations are independent and have no dependencies, then will the thread be executed in parallel on a multi-core CPU, say with 10 cores? Will the 100 write operations be distributed among the 10 cores and executed concurrently? And will the result of the thread be affected by whether it is executed on a single core or multiple cores? Who knows whether the code inside the thread is executed serially or in parallel, or both? As long as the result is consistent, that's all that matters.

So, my conclusion is that the write operations before Release are guaranteed to be completed, thanks to hardware or compiler support. It's like a wall that separates the code before and after Release. As for read operations, it depends on the dependencies between instructions. If the read operation depends on the result of a previous write operation, then the read operation must be completed before Release. Read->Modify->Write, this is similar, all three are dependent.

Therefore,the write operations before Release are guaranteed to be completed when execute the Release .As for other read operations, it depends on the dependencies between instructions. They may also be completed before Release, or they may not be.

Others memory ordering should be similar."

The program counter and machine code it reads is a sequence, only dealt with by a single control process. Read Wikipedia CPU and its links for more.

I tend to take the view of before being partially complete; then release finishes up, so becomes available to others to take.

Acquiring stdout is an atomic operation, plus the underlying syscall probably synchronizes as well, so println isn't really a good example.


The only thing the atomic orderings guarantee is what effects in one thread are observable from another thread. It doesn't actually mean certain operations happen. For example, in this one:

static mut x = 1;
unsafe {
    x = 1;
}
flag.store(true, Ordering::Release);

The compiler might be able to tell that x is already 1 and not do any actual write operation, since that has no effect on what other threads see.

In reality, the compiler will do more than necessary to ensure the orderings are respected, either because the optimal code is hard to compute or because it isn't sophisticated enough to generate anything better. But the orderings themselves only care about effects between threads.


In that example, the ordering doesn't matter, they can both be Relaxed and they will still be synchronized. There's no other data to attach to the store and load. However, Acquire and Release are being used inside spawn and join. If you made this code compile by using scoped threads, the Release in spawn and corresponding Acquire inserted at the beginning of the spawned thread would ensure that the &flag reference stored in the closure has the address of flag when the closure reads it. The Acquire in join and corresponding Release inserted at the end of the spawned thread ensures flag isn't dropped[1] before the thread is finished reading it.

Summary
use std::sync::atomic::{AtomicBool, Ordering};
use std::thread;

fn main() {
    let flag = AtomicBool::new(false);
    thread::scope(|s| {
        let t = s.spawn(|| {
            // Thread 2: wait for flag to be set
            while !flag.load(Ordering::Acquire) {
                // spin
            }
            println!("Thread 2: flag is set!");
        });

        // Thread 1: set flag
        flag.store(true, Ordering::Release);
        println!("Thread 1: flag is set!");

        t.join().unwrap();
    })
}

Within one thread, the compiler and CPU will eagerly change numerous things about your code—not just ordering—without ever causing observable differences except for speed. The compiler does this by looking at operations that are known to be observable effects (e.g. FFI, atomics, inline asm) and then seeing what data they depend on, only producing the instructions necessary for that data to eventually exist. The compiler is only interested in generating the observable effects your code describes, not the computations your code describes. The atomic orderings allow you to manually indicate observable effects concerning inter-thread data, since the compiler doesn't know those automatically.


This can be optimized like so:

let b = 2;
let a = 2;

However, this is only possible because all this happens in one thread. The code to describe the exact nature of this dependency across threads does not exist, but you could imagine it.

static mut b;
static mut a = 1;
static sync = AtomicBool::new(false);

// thread 1
a += 1;
sync.store(AddedOneRelease(a), true);

// thread 2
while !sync.load(AddedOneAcquire(a)) {}
b = a;

I've made up a hypothetical ordering called AddedOneRelease that means a just had 1 added, so it is valid for AddedOneAcquire to take its value from before and add 1. If the compiler can verify that a is only written to once, then it could change thread 2 to this, and even have it execute before thread 1:

// thread 2
b = 2;

But the orderings we have are already complex enough, so this kind of thing is unlikely to ever exist.


  1. AtomicBool doesn't have any drop behavior, but if it did Acquire would ensure it stays after join. ↩︎

1 Like

FWIW, I was speaking specifically about the general case, such as the format machinery. It may not be a good example, but it's what was contextually available at the time to point out that the memory accesses in question are not strictly limited to syntactic scope.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.