Generic atomic?

The "crossbeam-channel" crate uses atomic operations on pointers, with unsafe code. Unsafe lock-free code is not a good thing to find, and might be brittle across platforms. It looks like that code required a lot of debugging.

Should there be language support to allow atomic operations on any type which fits within the atomic limits of the platform? That would simplify this sort of thing. The compiler knows what operations the CPU can do, but crates do not.

That creates a portability problem, of course. Some CPUs can lock 4 bytes, some 8 bytes, some even 32 bytes if it doesn't cross a cache line boundary. Is there any Rust-supported CPU that can't lock "usize" and pointer sizes? If not, it's worth considering offering generic atomic up to that size.

Maybe have an "Atomic" trait for structs, which will force any needed alignment and will fail to compile if the struct is bigger than the maximum size for which the hardware has atomic operations.

It's probably too early for this right now. It'd want a where size_of::<T>() < MACHINE_ATOMIC_SIZE constraint, but that can't be written directly right now, and it's not obvious to me that making a special-purpose trait for it would be worthwhile.

1 Like

Ah, you want constants for generics fixed first. That's a good way to express this.

There's also an alignment issue.

struct atomic_pos {
    x: f32,
    y: f32
}

Ordinarily, that won't be aligned on a 8-byte boundary, I think. A u64 would be. The storage allocator has an "align" parameter. Something has to tell it that this structure needs to be 8-byte aligned on platforms where the atomic operations only work on 8-byte aligned objects. So at least alignment for atomic usage has to be a type attribute known to the compiler.

Would it be interesting to prototype this by implementing a transmute-based generic atomic wrapper using existing language functionality?

use bytemuck::Pod;
use std::marker::PhantomData;
use std::sync::atomic::{AtomicU32, Ordering};

/// `T` must have size = 4 (checked at run time).
struct Atomic4<T: Pod> {
    storage: AtomicU32,
    /// Not necessary given `Pod` but would be in general.
    _phantom: PhantomData<T>,
}

impl<T: Pod> Atomic4<T> {
    pub fn new(value: T) -> Self {
        assert_eq!(std::mem::size_of::<T>(), 4);
        Self {
            storage: AtomicU32::new(bytemuck::cast::<T, u32>(value)),
            _phantom: PhantomData,
        }
    }

    /// As per AtomicU32::fetch_update
    pub fn fetch_update<F>(
        &self,
        set_order: Ordering,
        fetch_order: Ordering,
        mut f: F,
    ) -> Result<u32, u32>
    where
        F: FnMut(T) -> Option<T>,
    {
        self.storage.fetch_update(set_order, fetch_order, |value| {
            Some(bytemuck::cast::<T, u32>(
                f(
                    bytemuck::cast::<u32, T>(value))?))
        })
    }
}

fn main() {
    let atomic_pair: Atomic4<[u16; 2]> = Atomic4::new([1, 2]);
    atomic_pair
        .fetch_update(Ordering::Relaxed, Ordering::Relaxed, |[a, b]| Some([b, a]))
        .unwrap();
}

(Using Pod is an overly strong bound since it prohibits lots of things that would be okay here, but something like it is needed to make the generic transmute safe.)

1 Like

Right, "pod" is safe, but very restrictive.

One would like to do atomic "take" from an Option, even for Option<&foo>, where the content is a reference. "take" is a classic use for compare and swap. Classic lock-free handoff to another thread.
References aren't 'Pod", so that wouldn't work.

There is an "atomic_take", but it needs unsafe code..

Everything lock-free seems to involve unsafe code, which would be better encapsulated in simple atomic operations that correspond to what the hardware can do. Since this is a code generation issue, it needs compiler support. Crates have no clue when fence instructions are needed.

The key requirement is lack of padding bytes, so I'd go for ::zerocopy's AsBytes trait instead.

From there, one can currently hack post-monomorphization errors to get a compile error rather than a panic, should the API be misused:

pub
unsafe trait AsBytes : Sized {}
unsafe impl AsBytes for u64 {}

pub
trait Size4 : AsBytes {
    #[forbid(const_err)]
    const __: () = [()][(::core::mem::size_of::<Self>() != 4) as usize];
}

impl<T : AsBytes> Size4 for T {}

fn check<T : Size4> (_: T)
{
    let _ = T::__;
}

fn main ()
{
    check(42_u64);
}

That's progress. Can you write a generic AtomicObject4<T>, AtomicObject8<T> which will make any 4 or 8 byte object atomic? Coerce into U32 or U64 and use the atomics for those.
(As const params to generics get more powerful, that may become a generic Atomic, but not yet.)

If do-able, this should be a part of standard Atomic.

I feel like generics might actually be the wrong path to go down, here.

Traits and platform-specifics tend to not work too well together because it means code that compiles on one machine may break completely on another. Furthermore, because you tend to already be doing complex things with traits because you are trying to uphold certain invariants at compile time, the error messages you see can often be hard to understand.

The requirements of some compiler-implemented Atomic trait will also be quite complex and depend on internals only known by the compiler. That poses an issue where you want to provide a non-atomic fallback instead of failing to compile (e.g. use a mutex) because you now need to write some sort of #[cfg(supports_atomic_ops(path::to::my::Type)] guard. This is an issue because supports_atomic_ops relies on information from the type system, but when conditional compilation is applied (during parsing), that information hasn't yet been derived.

You almost need some sort of Atomic<T> type that is implemented in the compiler that transparently changes its internal representation and method implementations depending on the platform. I can see that getting a lot of push-back though.

Does anyone have experience with how C++ tackles this?

1 Like

In one word? Poorly. They have C++ memory model, but because it's poor match for the actual architecture of CPU performance-conscious projects have their own model instead.

I'm not sure how badly Rust even needs something like that: lock-free code is very hard to get right and it's not even clear if “safe lock-free code” is even possible (you need lock-free when you want to make concurrent access to data structure correct… but safe code prevents concurrent access in principle so how can it benefit from lock-free data structures?).

It would certainly be very tricky to implement and develop since you would need to develop Rust memory model first.

True. But it have no idea how to use these to keep invariants used by lock-free data structures unbroken.

Why don't they know it? Currently they deal with certain architectures because it's the only sane way to carry information to the compiler.

If we would have formal memory model and a way to transfer information to the compiler to generate proper code before 2030 — I would be really happy.

I'm not sure how that can work. Today abstract “lock-free” algorithms don't exist. They all hang on the capabilities of certain architectures: be it ARM, x86 or RISC-V. And depend on it.

If you look on the source of crossbeam you will see that there are lots of fences inserted in the appropriate (for the CPU architecture) places. Compiler couldn't insert these because it doesn't know what data structures these fences are protecting and thus have no idea which fences are appropriate.

The fact that you, then, need to use arch-dependent atomics is just a minor inconvenience.

2 Likes

That's a good point. Rust's atomic operations are supposed to fall back to a mutex if the hardware does not support atomic operations for the types involved.

Now, that seldom happens, because the types supported are all small. One, two, four, or eight bytes. Nothing larger. Atomic on arbitrary structs would trigger the fallback to a mutex more often. Or at least generate fence instructions on ARM targets.

That's perhaps a good argument for this. Fences are cheaper than mutexes, and you never get a full block and trigger a context switch. So there's a performance gain to be had here.

I'm pretty sure Rust will simply not let the AtomicU64 type exist if 64-bit atomics aren't supported.

1 Like

Everything 64-bit with Tier I Rust support seems to support 64-bit atomics.

  • ARM CORTEX-A Atomics to 64 bits.
  • AMD64 Some newer Intel CPUs support atomic operations out to 256 and even 512 bits, but that's not guaranteed.
  • AARCH64 Apple's CPU is basically ARM-like.

Not sure about the Tier 1 32-bit targets.

WASM has 64-bit atomic support.

  • WASM Atomics to 64 bits.

In Tokio we had to mock out AtomicU64 for platforms that don't have it. The mock is here and the list of platforms is here.

1 Like

Heh, I started writing some code down this path, only to end up stumbling upon:

I have been looking at their API, and they've made a choice which simplifies so many problems for the "sad" or subtle paths by virtue of incidentally being more flexible: rather than unsafely transmuteing back and forth between the generic type and its backing atomic type, they just rely on a user-provided Atom trait that shall perform the "pack"ing and "unpacking" operations for us.

  • This means that technically you could decide to back a [bool; 8] as an u8, etc., which are operations not supported by direct transmutation.

Whether this results in worse performance (e.g., slow {un,}packing operations could lead to a bigger race condition window within the typical speculative-load-update-and-apply-with-cas-or-rollback-loop approach in lock-free algorithms, greatly increasing the chances of such speculative load to be stale, leading to significantly worse performance under contention) is another question, one left to the implementor of the Atom trait.

But:

  • it will never be unsound,

  • and is way less confusing than the post-monomorphization errors approach I've showcased in my previous post.

Imho, that crate is really the best answer to the OP

3 Likes