Generic atomic?

John_Nagle · December 23, 2021, 7:41pm

The "crossbeam-channel" crate uses atomic operations on pointers, with unsafe code. Unsafe lock-free code is not a good thing to find, and might be brittle across platforms. It looks like that code required a lot of debugging.

Should there be language support to allow atomic operations on any type which fits within the atomic limits of the platform? That would simplify this sort of thing. The compiler knows what operations the CPU can do, but crates do not.

That creates a portability problem, of course. Some CPUs can lock 4 bytes, some 8 bytes, some even 32 bytes if it doesn't cross a cache line boundary. Is there any Rust-supported CPU that can't lock "usize" and pointer sizes? If not, it's worth considering offering generic atomic up to that size.

Maybe have an "Atomic" trait for structs, which will force any needed alignment and will fail to compile if the struct is bigger than the maximum size for which the hardware has atomic operations.

scottmcm · December 23, 2021, 9:14pm

It's probably too early for this right now. It'd want a where size_of::<T>() < MACHINE_ATOMIC_SIZE constraint, but that can't be written directly right now, and it's not obvious to me that making a special-purpose trait for it would be worthwhile.

John_Nagle · December 23, 2021, 9:33pm

Ah, you want constants for generics fixed first. That's a good way to express this.

There's also an alignment issue.

struct atomic_pos {
    x: f32,
    y: f32
}

Ordinarily, that won't be aligned on a 8-byte boundary, I think. A u64 would be. The storage allocator has an "align" parameter. Something has to tell it that this structure needs to be 8-byte aligned on platforms where the atomic operations only work on 8-byte aligned objects. So at least alignment for atomic usage has to be a type attribute known to the compiler.

kpreid · December 24, 2021, 12:05am

Would it be interesting to prototype this by implementing a transmute-based generic atomic wrapper using existing language functionality?

use bytemuck::Pod;
use std::marker::PhantomData;
use std::sync::atomic::{AtomicU32, Ordering};

/// `T` must have size = 4 (checked at run time).
struct Atomic4<T: Pod> {
    storage: AtomicU32,
    /// Not necessary given `Pod` but would be in general.
    _phantom: PhantomData<T>,
}

impl<T: Pod> Atomic4<T> {
    pub fn new(value: T) -> Self {
        assert_eq!(std::mem::size_of::<T>(), 4);
        Self {
            storage: AtomicU32::new(bytemuck::cast::<T, u32>(value)),
            _phantom: PhantomData,
        }
    }

    /// As per AtomicU32::fetch_update
    pub fn fetch_update<F>(
        &self,
        set_order: Ordering,
        fetch_order: Ordering,
        mut f: F,
    ) -> Result<u32, u32>
    where
        F: FnMut(T) -> Option<T>,
    {
        self.storage.fetch_update(set_order, fetch_order, |value| {
            Some(bytemuck::cast::<T, u32>(
                f(
                    bytemuck::cast::<u32, T>(value))?))
        })
    }
}

fn main() {
    let atomic_pair: Atomic4<[u16; 2]> = Atomic4::new([1, 2]);
    atomic_pair
        .fetch_update(Ordering::Relaxed, Ordering::Relaxed, |[a, b]| Some([b, a]))
        .unwrap();
}

(Using Pod is an overly strong bound since it prohibits lots of things that would be okay here, but something like it is needed to make the generic transmute safe.)

John_Nagle · December 24, 2021, 11:30pm

Right, "pod" is safe, but very restrictive.

One would like to do atomic "take" from an Option, even for Option<&foo>, where the content is a reference. "take" is a classic use for compare and swap. Classic lock-free handoff to another thread.
References aren't 'Pod", so that wouldn't work.

There is an "atomic_take", but it needs unsafe code..

Everything lock-free seems to involve unsafe code, which would be better encapsulated in simple atomic operations that correspond to what the hardware can do. Since this is a code generation issue, it needs compiler support. Crates have no clue when fence instructions are needed.

Yandros · December 25, 2021, 12:48am

The key requirement is lack of padding bytes, so I'd go for ::zerocopy's AsBytes trait instead.

From there, one can currently hack post-monomorphization errors to get a compile error rather than a panic, should the API be misused:

pub
unsafe trait AsBytes : Sized {}
unsafe impl AsBytes for u64 {}

pub
trait Size4 : AsBytes {
    #[forbid(const_err)]
    const __: () = [()][(::core::mem::size_of::<Self>() != 4) as usize];
}

impl<T : AsBytes> Size4 for T {}

fn check<T : Size4> (_: T)
{
    let _ = T::__;
}

fn main ()
{
    check(42_u64);
}

Playground

John_Nagle · December 28, 2021, 2:39am

That's progress. Can you write a generic AtomicObject4<T>, AtomicObject8<T> which will make any 4 or 8 byte object atomic? Coerce into U32 or U64 and use the atomics for those.
(As const params to generics get more powerful, that may become a generic Atomic, but not yet.)

If do-able, this should be a part of standard Atomic.

Michael-F-Bryan · December 28, 2021, 8:35am

I feel like generics might actually be the wrong path to go down, here.

Traits and platform-specifics tend to not work too well together because it means code that compiles on one machine may break completely on another. Furthermore, because you tend to already be doing complex things with traits because you are trying to uphold certain invariants at compile time, the error messages you see can often be hard to understand.

The requirements of some compiler-implemented Atomic trait will also be quite complex and depend on internals only known by the compiler. That poses an issue where you want to provide a non-atomic fallback instead of failing to compile (e.g. use a mutex) because you now need to write some sort of #[cfg(supports_atomic_ops(path::to::my::Type)] guard. This is an issue because supports_atomic_ops relies on information from the type system, but when conditional compilation is applied (during parsing), that information hasn't yet been derived.

You almost need some sort of Atomic<T> type that is implemented in the compiler that transparently changes its internal representation and method implementations depending on the platform. I can see that getting a lot of push-back though.

Does anyone have experience with how C++ tackles this?

VorfeedCanal · December 28, 2021, 6:07pm

In one word? Poorly. They have C++ memory model, but because it's poor match for the actual architecture of CPU performance-conscious projects have their own model instead.

I'm not sure how badly Rust even needs something like that: lock-free code is very hard to get right and it's not even clear if “safe lock-free code” is even possible (you need lock-free when you want to make concurrent access to data structure correct… but safe code prevents concurrent access in principle so how can it benefit from lock-free data structures?).

It would certainly be very tricky to implement and develop since you would need to develop Rust memory model first.

True. But it have no idea how to use these to keep invariants used by lock-free data structures unbroken.

Why don't they know it? Currently they deal with certain architectures because it's the only sane way to carry information to the compiler.

If we would have formal memory model and a way to transfer information to the compiler to generate proper code before 2030 — I would be really happy.

I'm not sure how that can work. Today abstract “lock-free” algorithms don't exist. They all hang on the capabilities of certain architectures: be it ARM, x86 or RISC-V. And depend on it.

If you look on the source of crossbeam you will see that there are lots of fences inserted in the appropriate (for the CPU architecture) places. Compiler couldn't insert these because it doesn't know what data structures these fences are protecting and thus have no idea which fences are appropriate.

The fact that you, then, need to use arch-dependent atomics is just a minor inconvenience.

John_Nagle · December 28, 2021, 8:15pm

That's a good point. Rust's atomic operations are supposed to fall back to a mutex if the hardware does not support atomic operations for the types involved.

Now, that seldom happens, because the types supported are all small. One, two, four, or eight bytes. Nothing larger. Atomic on arbitrary structs would trigger the fallback to a mutex more often. Or at least generate fence instructions on ARM targets.

That's perhaps a good argument for this. Fences are cheaper than mutexes, and you never get a full block and trigger a context switch. So there's a performance gain to be had here.

alice · December 28, 2021, 9:00pm

I'm pretty sure Rust will simply not let the AtomicU64 type exist if 64-bit atomics aren't supported.

John_Nagle · December 30, 2021, 8:13pm

Everything 64-bit with Tier I Rust support seems to support 64-bit atomics.

ARM CORTEX-A Atomics to 64 bits.
AMD64 Some newer Intel CPUs support atomic operations out to 256 and even 512 bits, but that's not guaranteed.
AARCH64 Apple's CPU is basically ARM-like.

Not sure about the Tier 1 32-bit targets.

WASM has 64-bit atomic support.

WASM Atomics to 64 bits.

alice · December 30, 2021, 8:55pm

In Tokio we had to mock out AtomicU64 for platforms that don't have it. The mock is here and the list of platforms is here.

Yandros · January 1, 2022, 8:18pm

Heh, I started writing some code down this path, only to end up stumbling upon:

I have been looking at their API, and they've made a choice which simplifies so many problems for the "sad" or subtle paths by virtue of incidentally being more flexible: rather than unsafely transmuteing back and forth between the generic type and its backing atomic type, they just rely on a user-provided Atom trait that shall perform the "pack"ing and "unpacking" operations for us.

This means that technically you could decide to back a [bool; 8] as an u8, etc., which are operations not supported by direct transmutation.

Whether this results in worse performance (e.g., slow {un,}packing operations could lead to a bigger race condition window within the typical speculative-load-update-and-apply-with-cas-or-rollback-loop approach in lock-free algorithms, greatly increasing the chances of such speculative load to be stale, leading to significantly worse performance under contention) is another question, one left to the implementor of the Atom trait.

But:

it will never be unsound,
and is way less confusing than the post-monomorphization errors approach I've showcased in my previous post.

Imho, that crate is really the best answer to the OP

system · April 1, 2022, 8:19pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Should I use an crossbeam::AtomicCell in this case? help	6	3542	July 18, 2019
Does a (standard) atomic wide pointer implementation exist? help	5	521	January 31, 2024
Single thread :: cell = multi thread ::?	9	720	May 8, 2022
Examples/tutorials for (the new) crossbeam help	2	3305	November 10, 2019
Can I implement seqlocks safely in rust? help	7	434	January 23, 2024

Generic atomic?

Related topics