Is it possible to synchronize structs using atomics?

I have a multi threaded program which looks something like this:

// Lots of threads do this
let some_ptr = &vec[thread_id] as *mut T;
ptr::write(some_ptr, my_struct);
fence(SeqCst);
we_are_done();
// Main thread does this
wait_for_threads();
for entry in &vec { ... }

The idea is that since each entry in the vec is only written to by a single thread, and the main thread is not reading anything before all threads are done, there should be no races. However, reading the LLVM docs for fence, it seems that fence only really synchronizes when operations are atomic. However, I don't see how it is possible to write my_struct atomically, since Rust only has primitive atomic types.

I could wrap vec in a Mutex, but I'd like not to, due to the overhead and the fact that pthread_mutex_lock isn't signal safe and the code here happens inside a signal handler.

I have two questions:

  1. Am I right in that my program isn't correct, since I rely on that non-atomic writes are synchronized using a fence?
  2. Is there a way to synchronize arbitrary struct writes, using atomics?

Hm, I would think you don't need that fence at all.

The synchronization is done by we_are_done / wait_for_threads pair, which establishes a happens before relationship between the threads.

That is (waving my hands), if two threads synchronize on the single atomic A variable, than all operations (even non-atomic) on all variables, that happened in the first thread before it has written to A should be visible to all operations on the second thread, after it has read the A.

In other words, within each thread the operations are executed in the source order (or, more precisely, "as-if" they were executed in source order).

The relative order of operations in different threads is indeterminate (there donesn't need to be an order at all, in fact: you may get "out of thin air operations").

However, if two threads execute an atomic write and atomic read, and the read observes the effects of write, than operations that on the first thread before the write come before all the operations that happened after the write on the second thread.

Disclaimer: I am not an expert in memory models :slight_smile:

Hmm, I guess I'm still a little confused about the exact semantics of atomics. Initially, when I wrote the code above, I figured that the synchronization from we_are_done and wait_for_threads did indeed apply to all variables, as you say. However, when I tried to convince myself that this in fact is the case, I started looking through the nomicon and LLVM docs; the nomicon doesn't really mention synch. as much as reordering, and LLVM explicitly says that it requires some relaxed (monotonic) atomic accesses which you'd use the fence with (I guess this is the part I'm confused about).

It definitely makes sense that the synchronization would imply visibility for all data accesses, because it seems overly strict if it didn't, but I find it difficult to convince myself that this is the case.

In any case, it seems to work correctly, but I'm on x86, so who knows what would happen if I used a platform with a weaker memory model (or if I'm just (un)lucky??).

If an atomic load on thread A reads the result of an atomic store from thread B, it implies A can see all previous regular stores B performed… if the atomic store used the 'release' memory ordering (or stronger) and the atomic load used the 'acquire' memory ordering (or stronger).

Higher-level synchronization mechanisms such as mutexes, channels, etc. generally also provide that guarantee.

A fence usually isn't what you want, and it's unnecessary here as long as the synchronization in we_are_done/wait_for_threads uses the right ordering.

3 Likes

By the way, to give more context for how fences fit into the picture: traditionally, on platforms with weak memory models, store-release is implemented as a fence followed by a store, and load-acquire is implemented as a load followed by a fence. (See here if you're interested in the nitty-gritty.) So a fence is in some sense a lower-level operation that's built into the newer atomic primitives. Interestingly, newer ARM processors have dedicated load-acquire and store-release instructions, which still provide the required guarantees but may be more efficient.

1 Like

Just to add a little bit to this, one use for fences that remains today on platforms with a weak memory model is conditional synchronization, which is useful for implementing things like reference counting:

// Am I the last thread to decrement the counter?
if atomic_ctr.fetch_sub(1, Ordering::Release) == 1 {
    // If so, synchronize with the other threads that decremented it
    atomic::fence(Ordering::Acquire);

    // ... do some finalization business ...
}

// If I'm not the last one, do nothing

EDIT: Fixed a missing memory barrier

1 Like

You want a Release on the sub there or else the fence doesn’t synchronize with that.

1 Like

Thanks everyone, this is assuring to hear.

What troubles me still is that I can't find any place in either docs where this is mentioned. Am I just looking in the wrong places, or is this one of these things that are sort of implied to work a certain way, but not really written in stone?

Ah, yes, you are right. The idea is to synchronize with the other threads that decrement the counter, after all :slight_smile:

1 Like

Which bits specifically? Creating synchronization (aka happens-before) edges?

Hmm, I guess? I'm not exactly sure what I'm looking for, just some assertion that the ordering and visibility relationships apply for non-atomic operations as well. It seems to me that most docs explicitly say that such and such apply to atomic operation X and Y, but none mentions non-atomic operations.

Maybe I'm just overthinking this though :stuck_out_tongue:

std::memory_order - cppreference.com does a decent job. Whenever you specify a memory order (either in C++ atomics or Rust, which follows the C++ model) you’re pretty much specifying behavior for surrounding (in program order) plain loads and stores.

2 Likes

Thank you! This is exactly what I've been looking for.