Struct with Atomic Fields Still Require Arc and clones

I have a struct that contains only an AtomicUsize, and I'd like to be able to pass instances of it to threads. Obviously, two threads can load/store values to the field safely, but I'm struggling to convince the compiler of this. The only method I've been able to come up with is to wrap waaaay too many things in Arc and use a handful of clone() and make_mut() calls. The full code can be found here: Rust Playground

While this compiles and works, it feels like I'm doing something wrong:

#[derive(Clone)]
struct foo {
    bar: Arc<AtomicUsize>, // Need the Arc<> so I can derive as Clone
}

fn main() {
    let mut a = Arc::new(foo::new());
    // not sure why I have to make_mut and clone here
    let mut clone_a = Arc::make_mut(&mut a).clone();
    
    thread::spawn(move || {
        clone_a.increase();  
    });

    Arc::make_mut(&mut a).increase();
}

Any help would be greatly appreciated!

1 Like

First of all, you don't need to use Arc inside foo to get Clone; you can just implement Clone manually.

Secondly, the reason you need make_mut is because you told the compiler you needed mutable access to bar... but it's an AtomicUsize, so you don't. So just don't do that.

At which point it just works:

use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;

struct Foo {
    bar: AtomicUsize,
}

impl Foo {
    pub fn new() -> Foo {
        return Foo { bar: AtomicUsize::new(32) };
    }
    
    pub fn increase(&self) {
        self.bar.fetch_add(1, Ordering::AcqRel);
    }
}

impl Clone for Foo {
    fn clone(&self) -> Self {
        Foo {
            bar: AtomicUsize::new(self.bar.load(Ordering::SeqCst)),
        }
    }
}

fn main() {
    let a = Arc::new(Foo::new());
    let clone_a = a.clone();
    
    thread::spawn(move || {
        clone_a.increase();  
    });

    a.increase();
}
3 Likes

@DanielKeep, thanks for the help! I never looked at the signature of fetch_add, because I just assumed it would be &mut self. It feels odd that it doesn't need to be, but happy that it isn't because it makes all this work.

One last question, why use Ordering::SeqCst in the clone code? I understand it's the most "strict" ordering type... just not 100% sure why it's needed; or really when to pick one over the other.

Updated the code for anyone else who stumbles upon this: Rust Playground

I think that @DanielKeep wrote this code to be generic. If you don't use sequential consistent memory order, the effect of the clone could been before or after other atomic operations.

Maybe for your specific case it could be sufficient an Ordering::Acquire ordering, but you have to carefully evaluate your choice.

If it was, you wouldn't be able to modify it without first acquiring an exclusive lock... which would rather defeat the purpose of having atomic types in the first place.

Because I'm not great at reasoning about ordering, and I didn't want to have to sit down and think about it.

1 Like

@DanielKeep / @dodomorandi - I guess what I'm really asking is, "Can you teach me to fish?"... ie, how should I go about reasoning about which ordering to pick? Not just which one is best for this scenario. Is it safe/OK to just always go with Ordering::SeqCst? Are there actually huge performance implications if I just always pick that? Thanks!

Oh, ok, just misunderstood :smile:

You are touching a very complex thing when thinking about memory ordering. I will not teach you how it works, but I will give you a brief overview of what you are working with. The reason I am saying "I am not teaching it" will be clear soon. :wink:

With the Ordering parameters, you are specifying to the compiler how the code must be compiled to be sure the effects of some operations can be observed at specific time and space.

With SeqCst you are always safe. It is not free at all, unfortunately. Indeed, to ensure that the effects you observe are exactly what you expect looking at the code, some synchronization instructions are placed. Not the end of the world, but if you have some hot code, you will observe some slowdowns.

With Acquire, Release and AcqRel you are telling the compiler that you want to acquire, release and acquire and release the state of an atomic variable. In practice, when you perform an Acquire load you are sure to see the effects of a Release store on the same atomic. However (and I could be wrong) you cannot make assumption on the observable effects on other variables (you need SeqCst for that).

With Relaxed you don't make any assumption. In theory you could not see the effects of an operation happened before on the same atomic variable. This behaviour can be observed on weakly ordered architectures, and it is easy to have non-observable bugs in your code if you are developing on x86 and x86_64.

If can find much more information in the nomicon, at the LLVM page, and even the cppreference.com page contains many useful information. I think that now you know the reason I don't think I can teach these things: they are very complex, and they take time to be understood correctly.

I hope this can be a bit helpful.

EDIT: just a small note: the sanitizer will not help you with choosing the right ordering, because in any case there is no data race. One way is writing exhaustive tests and perform them on architectures with weak memory models like alpha.

1 Like

What @dodomorandi outlined is essentially correct.

The important bit to understand is that the compiler is allowed to reorder certain operations or to condense multiple instructions, or unroll loops, etc. This is, in part, how optimization works. The only rule is that it cannot change the 'end result' for some limited area of code (I'm being hand-wavy on purpose, the details are far too much).

The CPU is allowed to do the same sorts of things. What the Ordering enum does is explain to the CPU which rules are applied to which operations. Understand that this is very difficult to wrap one's brain around. I always refer back to references before reconsidering which ordering is necessary.

  • SeqCst requires that the CPU not perform any reordering with respect to that particular instruction and memory location - it erects a fence at that instruction, preventing it. It also ensures that all other thread's Release-marked writes, Acquire-marked reads, and SeqCst marked reads or writes are visible to this operation.
  • Acquire ensures that writes to a memory location that are semantically before the acquire-marked instruction cannot be moved after it (but it forms no barrier against reads). All Release- or SeqCst-marked writes from other threads will be visible.
  • Release ensures that reads from a memory location that are semantically after the release-marked instruction cannot be moved before it (but forms no barrier against writes). It also ensures that this operation will be visible to Acquire- and SeqCst-marked reads.
  • AcqRel combines Acquire and Release, which is very similar in effect to SeqCst, but not the same. AcqRel in effect forms one-way barriers against writes and reads - writes cannot move after the marked instruction, and reads cannot move before. It ensures that the operation has visibility of other thread's prior Release- and SeqCst- marked writes, and is visible to their Acquire- and SeqCst- marked reads
  • Relaxed applies no rules to the operation. The only advantage of the atomic type here is that it should require that the read or write is completed atomically - it's not possible for two threads to write or read to the location at once (for values which are not loaded in a single instruction, which depends on the platform). It is possible that Relaxed-marked operations do not see other thread's changes, or that other threads do not see Relaxed-marked changes, until 'eventually'. It is also theoretically possible that the CPU reorder Relaxed operations in a single thread with respect to one another.

What all this means is that the CPU is limited in how it is allowed to optimize your code to ensure that your code operates as expected. You're just providing the rules.

In general, however, for x86-based systems (including x86_64), because of the rules provided by the architecture, Acquire is free on reads, and Release free on writes. There is no performance penalty paid, because the architecture will behave following those rules anyway.

If there is any doubt, SeqCst is the best bet, but in general: Acquire for reads, Release for writes, and AcqRel for read-modify-write (like fetch_add). That will nearly always handle what you need to happen.

If that was in any way unclear, refer to the links dodo provided - they're excellent! (though at least as dense as this - they're the technical references I used to check myself here)

Normally (for other languages), I'd write a disclaimer about other, non-atomic, types here, but Rust's guarantees do a pretty good job of preventing you from messing it up outside of unsafe-land.

8 Likes

FWIW, my understanding is that SeqCst is a safe and likely sufficiently fast choice for ~any situation. I personally would go with it until profiling indicates that it's sufficiently hot area to (potentially) change to any other memory ordering.

3 Likes

An example without Arc, using Rayon library.

Typically it is safer to start with Mutex/RwLock. If your using atomic your after that last bit of performance so do consider the Ordering. Performance is an even bigger topic to get into that atomics. Academic side goes on about the big-O but practical use is IMO more interesting and deals with profiling code.

So if I want to share a AtomicUsize between threads(so that threads can read/write it), do I have to wrap it using Arc? I mean, I saw AtomicUsize implements Send & Sync trait, which means it can be used between threads safely, but I didn't find a "clone" method in AtomicUsize so I have no idea whether/how I can do that.

You don't need Clone to share things. You need to put it behind some kind of pointer because if you don't, then how do you share it?

yeah I understand. I'm a C/C++ programmer so the first thing I can imagine is trying to pass the reference of the AtomicUsize. I tried to do that but it ends with that compiler complains loudly that reference lifecycle declaration is needed -- on the member/function/struct/... so I finally abandoned.
So anyway, it seems that using a Arc to wrap the AtomicUsize is mandatory right?

It's not mandatory at all. You can put an atomic in a static variable, or you can put it behind a normal reference if you're using a scoped thread (like from rayon or crossbeam).

Oh, okay thanks DanielKeep. "put an atomic in a static variable" -- I recall this needs to be wrapped in a "unsafe" block right? Also using a third party crate sounds overkill just for this.
Anyway, I'm just curious about that, I'm totally fine to use Arc. I just don't get used to that -- wrapping a thread-safe struct using another thread-safe struct for just passing the reference/pointer. :slight_smile:

No. You need unsafe if the static is mutable, but atomics don't need to be mutable to change them.

This is incredibly common in Rust. The core team tries to push as much out of the standard library as possible. Avoiding external crates is just going to cause you absolute agony.

You're not. Arc is the pointer. If you didn't use Arc or some other kind of pointer, you'd be passing a value. If that doesn't make sense, then I suspect you need to read about the difference between value semantics and reference semantics.

1 Like

Any structure that can be deallocated are not thread safe on the outside. Non-mutable statics can't so are safe.

Any structure that can be deallocated
Indeed. Thanks jonh.

To be clear, it may not reorder any other SeqCst operations, non-SeqCst operations are free to be re-ordered (provided they respect the implied acquire and/or release semantics).