How to write once/rarely and read freely(ie. inexpensively)?

How to write a bool once(or very rarely), atomically and to ensure a happened-before visibility for all other subsequent reads to it(which can happen from other threads/cores) ?!

Can this be done by an AtomicBool with Ordering::SeqCst for the write, and with all reads having Ordering::Relaxed ? (am I understanding this correctly?!)

In other words, does that kind of write ensure that something akin of a flush happened after it, visibility wise, so that any other reads after it see that value(instead of the previous value) after it's done ?

Furthermore, I assume that if an even more relaxed reading could happen, such as non-atomic reading, then it would be UB(undefined behavior) because, well it should be atomic read instead ?? Though it feels like in practice any normal read would still be akin to an atomic read, at least on x86_64? (unsure)

Is there perhaps any other way to ensure such a flush-write?
so that then any other subsequent reads could be free, maybe even be normal reads ? (unless, from above, normal reads of an atomicbool written value are UB?)

Here's a sample(playground link) with the aforementioned AtomicBool for no reason(other than for whoever was wondering):

use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    // Create a shared atomic boolean
    let atomic_bool = Arc::new(AtomicBool::new(false));

    // Clone the atomic boolean for each thread
    let atomic_bool_clone = Arc::clone(&atomic_bool);

    // Spawn a thread to write to the atomic boolean
    let write_thread = thread::spawn(move || {
        std::thread::sleep(std::time::Duration::from_millis(100));
        // Write to the atomic boolean with SeqCst ordering
        atomic_bool_clone.store(true, Ordering::SeqCst);
        println!("Write thread finished writing.");
    });

    // Spawn a thread to read from the atomic boolean
    let read_thread = thread::spawn(move || {
        let value = atomic_bool.load(Ordering::Relaxed);
        println!("Read thread waiting a bit, else it reads(too soon): {}", value);
        std::thread::sleep(std::time::Duration::from_millis(300));
        // Read from the atomic boolean with Relaxed ordering
        let value = atomic_bool.load(Ordering::Relaxed);
        println!("Read thread read value: {}", value);
    });

    // Wait for both threads to finish
    write_thread.join().unwrap();

    read_thread.join().unwrap();
}

First, I'd highly recommend reading and understanding Mara Bos's book on atomics and locks - it's worth the money if you're using atomics seriously.

Regardless of ordering in use, operations on a single atomic have a "total modification order" that's the same no matter what thread you examine the atomic from; if the only thing you care about is the boolean itself, you can just use Relaxed for all accesses.

The other orderings tell you about the relationship between accesses to the atomic, and accesses to other locations in memory. They don't affect accesses to the atomic at all.

6 Likes

What do you mean with this?

In the example you shared, using relaxed everywhere is sufficient.

4 Likes

Or rather, the orderings do not change anything in the example you shared. The second read could return false if the writer thread gets delayed due to other processes using a lot of CPU.

You cannot rely on sleeps for synchronization.

4 Likes

tl;dr: i (wrongly)thought ordering is needed to ensure memory synchronization of the just written bool value vs. later reads of that value, so that those reads don't read the previous (stale)value. In other words this misconception.


It appears that I misunderstood how atomics work. I was under the impression that, on non x86_64, (such as on arm, and I even tried(1,2) non-atomic static mut bools on qemu arm but failed to see what I was expecting - they still acted as if they were atomic with synchronized memory), if you write to an atomic, then to be sure that written value will be seen by other threads/cores, it has to be flushed somehow(like from CPU cache to RAM, or something) and that's what the orderings are for, to ensure that the subsequent reads are aware of that write having happened, and not read a stale/previous value.

But it's becoming more clear now, as farnz also mentioned, that in fact that value is instantly visible across threads/cores, and the ordering is in fact about something else: writes to other atomics, if any. I need to read more about it to make sure I understand it and to perhaps dispel my previous (wrong)understanding.

I think this puts it better than my above attempt at explaining it:

Memory orderings specify the way atomic operations synchronize memory. In its weakest Ordering::Relaxed, only the memory directly touched by the operation is synchronized. On the other hand, a store-load pair of Ordering::SeqCst operations synchronize other memory while additionally preserving a total order of such operations across all threads.

I guess the word I was looking for is "synchronized", so to make sure that write is synchronized, so any subsequent reads see what was written, instead of the previous (stale)value.

I guess using ChatGPT to learn about this wasn't a good idea :slight_smile: or perhaps my own bias of how it works led me to misunderstand it this way.

For example this is what it said at some point when I had this:

static IS_THIS_FORKED_PROCESS_AB:AtomicBool=AtomicBool::new(false);

and I was trying to set it to true at some point:

IS_THIS_FORKED_PROCESS_AB.store(true, core::sync::atomic::Ordering::Release);`
/* "When you write with Release, the memory operations prior to that write are guaranteed to be completed and visible to other threads before the write itself. However, the write operation itself might not be immediately visible to other threads depending on the ordering semantics of their subsequent read operations.
To ensure that the write operation is immediately visible to other threads, you need to use appropriate ordering semantics for their read operations. If you want the write to be immediately visible, you should pair it with Acquire ordering for the reading threads. This establishes a synchronization relationship that ensures the write will be visible to any threads reading with Acquire.
So, to make sure your write with Release is immediately visible to other threads, those threads should read with Acquire ordering. This combination ensures the desired synchronization and visibility of the write operation."
- chatgpt 3.5
*/

and then when I read it I used:

return IS_THIS_FORKED_PROCESS_AB.load(core::sync::atomic::Ordering::Acquire);

And this made sense to me.
But apparently just Relaxed is enough for both write/reads.

Now I know why comex said relaxed atomic load is enough, even though at the time, i admit to not understand why, as it didn't seem to make sense to me, with my (wrong)understanding of atomics/ordering(s).

What's even funnier is that I've already read this before and even a bit of Mara's book(but postponed most of it for later reading).

Agreed. I didn't intend to add any sleep, but definitely wanted to avoid some kind of barrier as that would've possibly messed up any possible synchronization from the atomics. Ideally, I wanted to have both threads already running before the first write happened, just in case the reader thread, reading after the write happened, could've gotten the previous (stale) value, somehow! (but now I know that's not possible even with Relaxed ordering)

Modifications are not always available instantly to other threads with atomics, and that's not what @farnz said. How quickly they are available depends on the CPU architecture and activity on the machine. The guarantees for "total modification order" that he mentioned are more specific:

https://en.cppreference.com/w/c/language/atomic

Each atomic object has its own associated modification order, which is a total order of modifications made to that object. If, from some thread's point of view, modification A of some atomic M happens-before modification B of the same atomic M, then in the modification order of M, A occurs before B.

But reading the book he recommended is probably a much better way to learn than reading the cppreference. I'd like to read that book myself.

1 Like

Seriously? You try to see how atomics are [mis]behave on QEMU? Seriously?

I don't even know whether to cry or laugh: emulation of memory model is so costly that Apple went to the hassle of implementing foreign memory model in their CPUs just to make emulation feasible and RISC-V added similar extensions (for the exact same reason) and you expect that normal version of QEMU would emulate ARM memory model?

Of course QEMU on x86 would implement x86 memory model (which is compatible with ARM programs) and I think it even emulated ARM memory model when it emulates x86 on ARM (although here I'm not 100% sure).

Doing anything else can only be done in specialized emulators designed for developers of hardware (and they are 100-1000 times slower than normal emulators).

that link is Subscription required, can't see it, sorry.

Well it's good to know everything else, thanks.
That would explains some things :smiley:

Importantly, though, at best an ordering other than relaxed will take the same time as relaxed to make the atomic update visible; at worst, doing the work needed to meet the guarantees of other orderings will be slower than just using relaxed ordering (e.g. a release or sequentially consistent store may force the CPU doing the store to flush out buffered writes).

You cannot speed up the process of atomic changes propagating, only slow it down.

3 Likes

I'm a bit unclear on this point.

Assuming ATOMIC_BOOL_1 has value false initially, and is a static AtomicBool.

TimeStep1: Thread1 stores true in ATOMIC_BOOL_1 and the store itself is finished executing.

TimeStep2: Thread2 loads the value of ATOMIC_BOOL_1

Does Thread2 now NOT have the value true at TimeStep2 there? assuming TimeStep2 happens right after TimeStep1(ie. after that store finished for sure) in some kind of global time or something.

Because it sounds as if, there's some time between after the store is done and the change that it did becoming available to other loads, or is it just that it's just not instantly available to other non-atomic reads (but those aren't really possible unless some unsafeness happenes?)

I mean, in what way isn't the stored value instantly available? if I'm understanding that total order of modifications correctly, any subsequent(after store) load from any thread would find the stored value instantly available, or are you saying that it could be reading a stale value? or that other reads of that stored value, that aren't atomic(but again, via unsafe then? or how?) won't see that stored value ?(because to see it instantly u had to use a load instead)

Or is the store operation somehow not blocking until it's done?

Please, need clarification:) before I delve deep into reading(s).
And thank you everyone who has replied. I appreciate your time and replies!

Well I hope this isn't true, from chatgpt 3.5, because otherwise wow:

"You're correct to highlight the possibility of a load operation in one thread occurring so quickly after a store operation in another thread that the updated value may not have propagated to the cache of the thread performing the load. This scenario is known as a "store-load forwarding" delay or a "store-load latency" issue.

In modern processors, there is indeed a possibility, albeit very small, that a load operation might occur before the updated value has propagated to the cache of the thread performing the load. This situation can lead to the load operation observing a stale value.

However, the likelihood of such a scenario occurring is extremely low, especially in modern multi-core processors with sophisticated cache coherence protocols and memory ordering mechanisms. These mechanisms are designed to minimize the risk of such timing-dependent issues and ensure consistent behavior in concurrent programs.

While it's theoretically possible for a load operation to observe a stale value due to store-load latency, in practice, it's rare and generally not a significant concern for most applications. Nonetheless, it's essential to be aware of this possibility when designing concurrent algorithms and synchronization mechanisms."

Well, according to that wiki link, seems like it's not an issue.

There is no such thing, there is only the happens-before ordering that I linked, which is quite complex. You need some sort of synchronization to guarantee that a change in one thread is always seen by a single load in another thread. Without that, the change may be missed by the other thread. Without studying atomics in detail, the best approach is to use the synchronization APIs in the std library (or other crates) and not use atomics directly.

3 Likes

In that case, yes, it's immediate. But there's a more complicated case to consider that's permitted in some memory models:

  1. Thread2 starts a load from ATOMIC_BOOL_1, and confirms that there are no in-flight writes to ATOMIC_BOOL_1.
  2. Thread1 starts a store to ATOMIC_BOOL_1.
  3. Thread1 completes its store to ATOMIC_BOOL_1.
  4. Thread2 completes its load from ATOMIC_BOOL_1.

What value should Thread2 load? It started the load before Thread1 attempted its write, and could therefore read before Thread1 writes, but it completes the load after Thread1 completed its write, so could therefore read after Thread1 writes. Both are possible, and indeed allowed.

3 Likes

Well, these two replies definitely blow my understanding of what I thought atomics are all about. So I still need some kind of sync to ensure load/store themselves are synchronized, that's just great :slight_smile:

I'll be back only after I've done the necessary reading, otherwise risk wasting people's time I feel like which I really don't want to do, out of respect.

Another way of looking at it is that atomic operations are used to build synchronization APIs. So you can make these guarantees with atomic operations alone, it's just that knowing how to do this is not so simple.

3 Likes

You don't need synchronization per-se, because that's what atomics provide. But you've got no guarantees about timeliness, only about ordering of memory operations as observed by other threads.

The core to this, as @jumpnbrownweasel has said, is that computers don't have a sense of "global time"; instead, we use the happens-before relationship to define relative time between threads. This, in turn, means that there's no guarantee of "instant" updates - and indeed, on a general purpose computer, that's a really hard thing to manage (what happens if I use my OS's tools to freeze Thread2 just after it reads the boolean, for example?)

3 Likes

The assumption of "global time" is implicit here but it bears remark: how do you know that TimeStep2 happens after TimeStep1? Suppose Thread 2 loads false, how would you distinguish between a "correct" false that happened before TimeStep1 and a "incorrect" false that happened after? And if you can't tell the difference, would you not just say: "TimeStep2 happened before TimeStep1"?

Unless you have reference to some global timer of some kind which could tell you the difference - let's say an atomic counter? But then you have two atomics and the meanings of the other memory orderings actually start to come into play.

As long as you have only one atomic, you can always imagine that all atomic operations happen in a globally sequentially consistent order. What that order is though you can only tell by running the program and observing its behavior; so you might say that the observed behavior is as if all atomic operations did indeed happen instantly in some order, even if the order doesn't match what you would see if you put a logic analyzer on the chip and lined things up in "real" time.

Correct multi threaded programs, however, can't assume that atomic operations happen instantly; they must have correct behavior whether the operations are instant or not.

4 Likes

That's the trick: these atomics that people are trying to envision stopped existing more than 30 years ago (on popular systems, I mean, some supercomputers had all these issues even before).

After faster models of 80486 got writeback cache illusion of single memory and single atomics have become just that: illusion.

Since memory is no longer shared and atomics are now imaginary, not real, talking about what would you see at system bus becomes useless and pointless: each CPU have it's own idea about how world looks like and that's how memory ordering models are born: they are simply models, they are not real.

Once you understand that it becomes easier to talk about other stuff.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.