Volatile atomic operations?

From cppreference on atomics:

The argument is pointer to a volatile atomic type to accept addresses of both non-volatile and volatile (e.g. memory-mapped I/O) atomic objects, and volatile semantic is preserved when applying this operation to volatile atomic objects.

(atomic_compare_exchange_weak, atomic_compare_exchange_strong, atomic_compare_exchange_weak_explicit, atomic_compare_exchange_strong_explicit - cppreference.com)

I have a MMIO-like situation (a memory map shared between processes). I need my accesses to a shared variable to be atomic (they’re used for synchronization) and also apparently I need them to be volatile, since they’re in shared memory. This presents a problem: there’s currently no way (I know of) to do volatile operations on atomics.

Related: Volatile + relaxed atomic load/store

2 Likes

Discussed this in Discord for a bit. Summary:

  1. Other side effect can be used to make up for the lack of volatile (probably inline asm?)
  2. There may or may not be practical difference between volatile _Atomic and just _Atomic for LLVM. There’re claims that there is, but I wasn’t able to write an example, nor did anyone provide one.
  3. All the optimizations from this paper: N4455 No Sane Compiler Would Optimize Atomics do not work.
  4. Parallel programming is maddening.
1 Like

Is there a question?

The question is: is there a case where volatile atomic access compiles into something different than plain atomic access on LLVM and if there is, is feeding pointer to atomic into black box prior to access enough to prevent this from happening?

From this earlier discussion on the topic, the answer is no.

Edit: Although if I'm reading it correctly, to ensure the loads are not optimized out, now or in the future, they should use a non-relaxed ordering and two loads should not be adjacent (with no intervening code).

Consider this code:

let x: AtomicU8 = AtomicU8::new(0);
x.fetch_add(1, Ordering::Relaxed);
x.fetch_add(1, Ordering::Relaxed);

I think that it can be optimized to this:

let x: AtomicU8 = AtomicU8::new(0);
x.fetch_add(2, Ordering::Relaxed);

This adds atomicity and ordering with any other operations can be consistent. Collapsing these two stores can’t be observed.
LLVM currently doesn’t do this, but it potentially could.

Your edit refers to even more worrying possibility: optimizing out a “redundant” load.

Rust doesn't currently provide those operations. If neither atomics or volatiles are enough for you, your only current option is to use inline assembly.

Wouldn't collapsing the two fetch_add calls change program behavior, since the value would then never be odd? And wouldn't this also reduce the frequency that changes are detected by loads?

I realize those changes are unlikely to matter for the program, but they still seem to break the rule about changing behavior.

As pointed out below (thanks @Goldstein and @alice) the above is incorrect.

It strictly adds atomicity. It may happen that two fetch_add’s are ordered one right after another, in which case external observer only sees 0 and 2. It’s consistent with a single fetch_add.

1 Like

Would something like

// Blackbox pointer to atomic
asm!("", in(reg) atomic);
let val = atomic.load(Ordering::Acquire);

be enough to emulate a volatile atomic load? It causes a side effect while also being atomic.

With multiple threads, programs generally have multiple different allowed executions depending on how threads are interleaved and so on. The as-if rule only requires that the actual program behavior matches one of those allowed executions. It does not require that all allowed executions are possible in the actual program.

And with two consecutive fetch_add operations, not seeing the value being odd is always an allowed execution.

I doubt that it's guaranteed to be enough, but whether it is in practice, I don't know.

3 Likes

I can't answer in general, but I can come up with an example where a plain atomic does not behave as volatile: link to Compiler Explorer.

You see how the three functions simple, volatile and atomic compile down to the exact same assembly on aarch64. However, in main, volatile is the only call that is not optimized away by the compiler. In other words, if you want the compiler to know your store/load has side effects, atomic is not enough. So it's not a matter of what assembly the actual load/store compiles down to, it's about what optimizations the compiler is allowed to do.

3 Likes

Interesting. Blackboxing the pointer guards against this particular optimization though.

(I’ve probably placed asm block wrong: I think it should be before the load, but after the store to act like sort of a fence, but it still “works” in this case)

Weirdly, I can’t reproduce this effect in C: Compiler Explorer

Edit: actually I can with Clang: Compiler Explorer

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.