Single-writer, multiple-reader concurrency and the validity of `ptr::read`

As an exercise, I'm trying to implement a concurrent data structure based on a paper, "Non-blocking hashtables with open addressing". I'm at a point where I have a working hashtable for keys of type usize, since they can be updated atomically, and I'm in the process of adding support for larger keys as is detailed in section 3.3. Since they don't necessarily fit in a single machine word, they can't be updated atomically, and so the paper proposes a way to safely concurrently read and update the keys. Quoting from page 8:

... [A] cell’s key is only ever modified by a single writer, when the cell is in busy state.This means we only need to deal with concurrent single-writer multiple-reader access to the cell, rather than provide a general multi-word atomic update. We can therefore use Lamport’s version counters [4] (Figure 8).

This says that I can use a version number in conjunction with the value to ensure that the read is valid. It then goes on to describe how the version counter is used, which goes something like:

  1. (Atomically) Read and save the version counter value.
  2. Read the key value (assuming it is Copy) from something like an UnsafeCell.
  3. Before using the value further, read and compare the version counter value to detect concurrent writes, and discard the key value if there is a mismatch.

However, the concurrent writer and readers don't mix well with the safety contract of pointer accesses as provided by std::ptr:

  • All accesses performed by functions in this module are non-atomic in the sense of atomic operations used to synchronize between threads. This means it is undefined behavior to perform two concurrent accesses to the same location from different threads unless both accesses only read from memory.

This tells me that any concurrent read/write accesses to the given pointer is UB, even if I immediately detect and discard the invalid data.

Is my interpretation of this problem correct? If so, is there some alternative to raw pointers that would provide read/write operations with the stronger guarantees that I'm looking for? Specifically, writes must be well-defined in the presence of readers, correctly copying the original data and making those writes globally visible before returning control to the writer, and readers may read data in an invalid state during a concurrent write, relying on external mechanisms to prove the data's validity. Ideally, it would not involve locks, in the spirit of the original design, which rules out things like Mutex and crossbeam::atomic::AtomicCell.

Atomic read/write of arbitrary type is not supported on any major CPU architectures. You should exploit some other concurrency primitives to prevent concurrent access to the memory location. For example, std::sync::Mutex prevents concurrent access to the inner value by the mutex.

Some primitive types support atomic read/write via special CPU instructions. std::sync::atomic holds types that wraps such operations. You can use them to construct more general tools like the mutex itself. parking_lot is a pure Rust implementation of concurrency primitives like mutex, rwlock etc.

If your atomic operations on the version number use an Ordering other than Relaxed, they also act as barriers that prevent (some) memory accesses from being reordered across them by either the compiler or the CPU.

I’m not an expert in any of this, but I suspect that you can safey read the pointer as a MaybeUninit<T>, but only call assume_init() after you’ve verified that the memory is uncorrupted by a concurrent write.

Non-atomic memory accesses can be reordered even across SeqCst memory accesses. Also LLVM defines data-races to always be UB. This means that the very act of reading/writing memory in a non-atomic way when there is a concurrent write is not allowed at all.

That would imply there’s no way to build a correct Mutex implementation with just atomics and spinlocking, which doesn’t feel right. What’s the lowest-level concurrency primitive that allows protecting non-atomic memory?


Edit: This example from the Nomicon page I linked above seems to use Acquire/Release to protect access to non-atomic memory:

use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
use std::thread;

fn main() {
    let lock = Arc::new(AtomicBool::new(false)); // value answers "am I locked?"

    // ... distribute lock to threads somehow ...

    // Try to acquire the lock by setting it to true
    while lock.compare_and_swap(false, true, Ordering::Acquire) { }
    // broke out of the loop, so we successfully acquired the lock!

    // ... scary data accesses ...

    // ok we're done, release the lock
    lock.store(false, Ordering::Release);
}

If you perform a synchronizing action in between the write on one thread and the read or write on another thread it is fine. This could be locking a mutex, it could be modifying an atomic var on one thread and loading it on another when the other thread checks that the loaded value is one that indicates that the first thread is done modifying the piece of memory you want to access.

This statement is at odds with your earlier one that non-atomic accesses can be reordered across a SeqCst event, which would invalidate it as a “synchronizing action.” (I can believe that a read first, then verify integrity later sequence might well be UB).

SeqCst has a happens before relationship with other atomic operations. Only when you ensure that the non-atomic operation only happens when the value loaded from the atomic operation indicates that the other thread doesn't write to the same memory as the non-atomic operation of the current thread.

Let's say that you have

// initial
*a = 0;
*b = 0;

// thread 1
*a = 42;
b.store(SeqCst, 1);

If there is another thread

let c = b.load(SeqCst);
let d = *a;

then that is UB, as there is no synchronization between a.load(SeqCst) and *a. It is valid to rewrite it to

let d = *a;
let c = b.load(SeqCst);

If the other thread instead was

let c = b.load(SeqCst);
if c == 1 {
    let d = *a;
    // foo
}

there is synchronization between b.load(SeqCst) and *a. It is thus not valid to rewrite it to

let d = *a;
let c = b.load(SeqCst);
if c == 1 {
    // foo
}

At least this is my understanding of atomic operations.

1 Like

That nomicon page makes the distinction between “atomic accesses” and “data accesses” at the top of the page, but then uses “accesses” without a modifier all over the place :confused:

I read it as atomics serving as fences for all accesses, atomic or not, from the same thread, but I can see how it might have meant the other way.


Edit: I dug into the C++ spec that the nomicon references, and there are a few interesting points:

  • Acquire, Release, and SeqCst loads and stores only affect the ordering of calculations dependent on the loaded or stored value. C++ also provides fences not associated with a value to synchronize all accesses.
  • SeqCst ordering is completely independent of Acquire/Release ordering: the compiler is not required to leep them consistent, even when referring to the same location.

In C++, dependency is purely syntactic, so I think this would also prevent the reordering:

let c = b.load(SeqCst);
if c || true {
    let d = *a;
    // foo
}

Interesting idea. The std::ptr docs say that ptr::write is only allowed when no other accesses happen at the same time. But I’m wondering if there are any technical reasons for it to be insta-UB when read+write (or even write+write) happens at the same time as long as it’s a *mut MaybeUninit<T>.

IMO, an exception for MaybeUninit could be valuable here. Of course then, as you said, assume_init() would still be UB if you didn’t make sure that data race happened.

Perhaps someone should open an IRLO thread about this.

1 Like

The compiler is allowed to reorder the read/write before any atomic operation that could have been used "guarantee" that no data race happened as there is no synchronization between them. If there is synchronization between them then by definition there was never a data race at all.

The algorithm in question is something like this; the pointer read is guarded by one check of the atomic and interpreting it as correct is guarded by another. If I read the C++ spec correctly (included by reference into Rust), this is sufficient to prevent reordering.

There’s still the issue of a concurrent read/write, though, which is dealt with after-the-fact here: If both atomic reads were the same, the critical section wasn’t poisoned by a write; otherwise, its result can’t be trusted and must be thrown out. The question is whether this kind of algorithm either is or could be made legal.

struct LockFreeReader<T:Copy> {
    cur_version: AtomicU32;
    data: *mut MaybeUninit<T>;
}

impl<T:Copy> LockFreeReader<T> {
    fn read(&self)->Option<T> {
        let version = self.cur_version.get(Ordering::Something);
        if version & 1 == 1 { None; } // Write in progress
        else {
            let result:MaybeUninit<T> = unsafe { self.data.read() };
            if version != self.cur_version.get(Ordering::Something) {
                None // Something changed during the read
            }
            else { unsafe{ Some(result.assume_init()) } }
        }
    }
}

Implementing seqlocks in Rust and C/C++ is hard. To quote parts of the readme of rigtop/Seqlock:

Implementing the seqlock in portable C++11 is quite tricky. The basic seqlock implementation unconditionally loads the sequence number, then unconditionally loads the protected data and finally unconditionally loads the sequence number again. Since loading the protected data is done unconditionally on the sequence number the compiler is free to move these loads before or after the loads from the sequence number.

[...]

We see that the compiler did indeed reorder the load of the protected data outside the critical section and the data is no longer protected from torn reads.

[...]

There are two ways to fix this:

  • Make the location of the protected data dependent on the sequence number by storing multiple instances of the data and selecting which one to read from based on the sequence number. This solution should be portable to all CPU architectures, but requires extra space.
  • For x86 it's enough to insert a compiler barrier using std::atomic_signal_fence(std::memory_order_acq_rel) . This will only work on the x86 memory model. On ARM memory model you need to inserts a dmb memory barrier instruction, which is not possible in C++11.
3 Likes

Thanks for the replies, a SeqLock definitely performs a similar function to what I'm trying to accomplish. I realized that crossbeam::atomic::AtomicCell is synchronized using a SeqLock which they have implemented internally, and the comments address a few of the issues brought up here:

We need a volatile read here because other threads might concurrently modify the value. In theory, data races are always UB, even if we use volatile reads and discard the data when a data race is detected. The proper solution would be to do atomic reads and atomic writes, but we can't atomically read and write all kinds of data since AtomicU8 is not available on stable Rust yet.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.