SeqLock, UB, and practical considerations

I am trying to implement SeqLock-inspired data structure: non-blocking writer relying on reader to discard data if tampered with.

My understanding is that every SeqLock implementation is by denifition UB in both Rust and C++, as it assumes a possibility of data race. Yet, SeqLock is still used out there—everyone just shrugs it off as long as it works in practice.

I took a look at multiple implementations in both Rust and C++ and it seems like everybody relying on some different magic to keep it altogether. Someone using volatile_read, others using acq_rel fences.

For example, let's look at the implementation in seqlock crate:

#[inline]
    pub fn read(&self) -> T {
        loop {
            // Load the first sequence number. The acquire ordering ensures that
            // this is done before reading the data.
            let seq1 = self.seq.load(Ordering::Acquire);

            // If the sequence number is odd then it means a writer is currently
            // modifying the value.
            if seq1 & 1 != 0 {
                // Yield to give the writer a chance to finish. Writing is
                // expected to be relatively rare anyways so this isn't too
                // performance critical.
                thread::yield_now();
                continue;
            }

            // We need to use a volatile read here because the data may be
            // concurrently modified by a writer. We also use MaybeUninit in
            // case we read the data in the middle of a modification.
            let result = unsafe { ptr::read_volatile(self.data.get() as *mut MaybeUninit<T>) };

            // Make sure the seq2 read occurs after reading the data. What we
            // ideally want is a load(Release), but the Release ordering is not
            // available on loads.
            fence(Ordering::Acquire);

            // If the sequence number is the same then the data wasn't modified
            // while we were reading it, and can be returned.
            let seq2 = self.seq.load(Ordering::Relaxed);
            if seq1 == seq2 {
                return unsafe { result.assume_init() };
            }
        }
    }

Is the read_volatile necessary there? Wouldn't normal read or copy_nonoverlapping work just as well? Sure, in theory all of this is UB, but it seems to me that compiler has no way of "catching me" doing UB here, or not? Here is my reasoning why it should work:

  1. Compiler can not reorder ptr::read(self.data.get()... before seq1 load because that load is Acquire and the read is conditional on value of seq1, namely it must be even.
  2. Compiler can not reorder ptr::read(self.data.get()... with seq2 load because (according to preshing) "An acquire fence prevents the memory reordering of any read which precedes it in program order with any read or write which follows it in program order."
  3. Compiler can potentially load the same data multiple times (which volatile would prevent), in parts, whatever, but that doesn't concern us, as long as all those loads happen between seq1 and seq2.

What am I missing? Is there some other hidden way compiler could exploit this UB and make this not working?

Also, is preshing's explanation correct? I wasn't really able to comprehend the fence documentation in current rustdocs.

Thanks!

This is my take (could be completely wrong, and not original OP question.)
You can only prevent memory reordering if ordering is established.

My idea of bug in the code is;
Consider the data is over two cache lines with the atomic also in the last.
Read CPU has access to the last without going to main memory.
Writer CPU stores modification bit which gets written to main

        self.seq.store(seq, Ordering::Relaxed);

        // Make sure any writes to the data happen after incrementing the
        // sequence number. What we ideally want is a store(Acquire), but the
        // Acquire ordering is not available on stores.
        fence(Ordering::Release);

and then the data, which also the get written from cache to main because the CPU wanted to.
Read CPU loads first cache line,( so modified part of data) and uses unmodified is last.
Goes through fence, no change.
Load same atomic value from cache and assumes correct data.

The problem is the comment. "Make sure any writes to the data happen after incrementing", no such synchronisation capability. It is true on the same thread only.

A fix (but not the underlying UB problem) is maybe to change to

let seq2 = self.seq.fetch_add(0, Ordering::Relaxed);

Can't comment about volatile but will point out a suggestion I have just looked at Use atomic_element_unordered_copy_memory_nonoverlapping to read the data · Issue #4 · Amanieu/seqlock · GitHub

I tried reimplementing the read so data takes an array of AtomicUsize. (+nightly features.) It produced the same x86 machine code as original. With added benefit of not being UB (assuming the conversion to T does not add anything bad.)

I think the strong cache coherency of the processor means it does not need to add extra such as mfence opcode I saw when using fetch_add.

Without atomic data and changed just to ptr::read a test showed when I only used part of the result it only read needed part. I think (but could be wrong and not tried to produce) with certain layout and using ptr::read the code could in some cases use simd instructions/registers. There is potential for corruption.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.