Optimisations when atomic primitives are not shared?

In this page std::sync::atomic - Rust it says:

A Rust atomic type that is exclusively owned or behind a mutable reference does not correspond to an “atomic object” in C++, since it can be accessed via non-atomic operations.

To me, this implies that the Rust compiler can optimise atomics away in situations where it can prove they are unnecessary. However... when I inspect the assembly code, it appears that the instructions are identical, regardless of unique access.

Playground link

fn do_stuff_atomic_ref(d: &AtomicI32) -> i32 {
    d.fetch_add(2, Ordering::Acquire)
}

fn do_stuff_atomic_mut(d: &mut AtomicI32) -> i32 {
    d.fetch_add(2, Ordering::Acquire)
}

fn do_stuff_atomic_owned(d: AtomicI32) -> i32 {
    d.fetch_add(2, Ordering::Acquire)
}

All three of these produce the same assembly:

movl  $2,   %eax
lock  xaddl	%eax, -4(%rsp)
retq

Similarly for ARM targets, I see ldadda instructions in each case.

Is this a mistake in the documentation or am I misinterpreting it?

No you didn't misunderstand. However, just because the compiler is allowed to make an optimization doesn't mean that it is guaranteed to make that optimization.

2 Likes

Which is true, correct, and the only way it may be done. Because fetch_add is not just increases the value, it also acts as memory barrier!

This second role requires the code that compiler is generating.

But if you have an exclusive reference to Rust atomic then you may turn it into normal variable like this:

fn do_stuff_atomic_mut_fast(d: &mut AtomicI32) -> i32 {
    let d: &mut i32 = d.get_mut();
    let r = *d;
    *d += 2;
    return r;
}

In that case there would be no lock. Since there are no memory synchronization

3 Likes

To me, it implies the existence of Atomic*::get_mut, which allows one to use exclisively-owned atomic as non-atomic:

#[no_mangle]
fn do_stuff_atomic_use_mut(d: &mut AtomicI32) -> i32 {
    let d = d.get_mut();
    let ret = *d;
    *d += 2;
    ret
}

This compiles to the same assembly as do_stuff.

(khimru beat me to that just to a moment :man_facepalming:)

1 Like

No, optimizing the atomic operation down to non-atomic ones would be okay in this case. The memory barrier is redundant because if you are doing anything that requires it, then you must necessarily be violating the uniqueness requirements of mutable references.

(The story would be different if the ordering was SeqCst.)

4 Likes

Thanks all.

In summary, yes, I was misinterpreting it.

I think the docs can be slightly improved to avoid this confusion by adding a few extra words:

A Rust atomic type that is exclusively owned or behind a mutable reference does not correspond to an “atomic object” in C++, since the underlying primitive can be mutably accessed with get_mut to perform non-atomic operations.

The memory barrier is redundant...

This was also my understanding, which is why I over-eagerly read that line in the documentation, the way that I wanted to understand it rather than what it actually said.

I'm curious as to why this should be different for SeqCst ordering though. If there can't be any other references, what difference could it make?

SeqCst introduces interactions with atomics at other memory locations.

Nope. &mut guarantees uniqueness in this particular piece of code only. It doesn't tell us anything about what happened before and/or after than moment. And time is non-linear if there are more than one core. You may get your &mut from Arc (with try_unwrap, e.g.) and then you still need barrier to ensure that everything works correctly.

For SeqCst it's obvious. For other ordering operations it's complicated and I, for one, am glad that compiler doesn't try to invent crazy schemes to “prove” whether lock may be silentry elided or not.

Working with atomics is tricky enough without such “help”, we don't need to make out life even more complicate.

1 Like

No, I don't agree. If you create a &mut to a memory location, and there is some other operation on the same memory location that is concurrent (that is, there is no happens-before or happens-after relationship between the operations), then you have a data race and that is UB.

But the only thing that non-SeqCst orderings can do is introduce happens-before relationships to other operations on the same atomic. Since you must already have those relationships or have UB, the ordering must have no effect.

7 Likes

SeqCst introduces interactions with atomics at other memory locations.

But would it matter if it didn't do that in the case where the compiler knows that the atomic isn't shared? This is the part I don't understand here.

The problem with optimizing away SeqCst is that, sure, your atomic is not shared. But other atomics in the program probably are. And SeqCst has effects on those other atomics.

2 Likes

That'd still allow the fetch_add to be lowered to non-atomic instructions and maybe a compiler barrier.

And I'm not sure if the compiler barrier is actually needed. After all mutable access means other parts of the program cannot observe the values in the variable that's being written to so there are no permutations of ordering that seqcst is supposed to prevent. Though I might be missing something...