Ergonomic SyncUnsafeSlice

There are several posts about writing unsafely from multiple threads to a single slice assuming that only one thread can write at a given location, as in that case even without synchronization there is not data race. This SO post is quoted several time as a solution. We are trying to obtain something similar, but a bit more ergonomic, and we ended up with the following idea:

pub struct SyncUnsafeSlice<T>(UnsafeCell<*mut [T]>);
unsafe impl<'a, T: Send> Sync for SyncUnsafeSlice<T> {}

impl<T> SyncUnsafeSlice<T> {
    pub fn new(slice: &mut [T]) -> Self {
        Self(UnsafeCell::new(slice as _))
    }

    #[inline(always)]
    pub unsafe fn get_mut_unchecked(&self, index: usize) -> &mut T {
        &mut (**self.0.get())[index]
    }
}

There seem to be no UB here as we never create a mutable reference to the slice (I'm a bit doubtful about that happens with (**self.0.get())[index], but that can be replaced with an add).

The advantage with respect to the original solution is that instead of a method write here you get an actual mutable reference, which might be useful if the thread is updating multiple time the element; also, you can use compound assignment operatiors.

Does this solution look sensible?

No, absolutely not. You just allowed shared mutability if the index remains the same across multiple calls:

let r1 = slice.get_mut_unchecked(0);
let r2 = slice.get_mut_unchecked(0); // BOOM, UB

Yes, this is the point. You have UB if you obtain the same index twice. If your code never obtains the same index twice, there is no UB. Or there should be no UB.

The other solution sports a write method. But if multiple threads write to the same location, it's a data race, and a data race is UB.

So for what I can understand, either both solutions are good, or neither is good.

I'm no unsafe expert, but I'd be more comfortable with something like this. If you really need an &mut T, you can use unsafe { &mut *slice.get_unchecked(id).as_ptr() }

use std::cell::Cell;
pub struct SyncUnsafeSlice<'a, T>(&'a [Cell<T>]);
unsafe impl<'a, T: Send> Sync for SyncUnsafeSlice<'a, T> {}

impl<'a, T> SyncUnsafeSlice<'a, T> {
    pub fn new(slice: &'a mut [T]) -> Self {
        Self(Cell::as_slice_of_cells(Cell::from_mut(slice)))
    }
    
    /// Safety: Access to any given index from multiple threads
    ///         must be extenally synchronized
    pub unsafe fn get_unchecked(&self, idx: usize)->&Cell<T> {
        &self.0[idx]
    }
}

In fact, we just modified our design in this way (which is similar):

use std::cell::UnsafeCell;

pub struct SyncUnsafeSlice<'a, T>(&'a [UnsafeCell<T>]);
unsafe impl<'a, T: Send> Sync for SyncUnsafeSlice<'a, T> {}

impl<'a, T> SyncUnsafeSlice<'a, T> {
    pub fn new(slice: &'a mut [T]) -> Self {
        let ptr = slice as *mut [T] as *const [UnsafeCell<T>];
        Self(unsafe { &*ptr })
    }

    pub unsafe fn get_mut_unchecked(&self, index: usize) -> &mut T {
        &mut *self.0.get_unchecked(index).get()
    }

    #[inline(always)]
    pub unsafe fn get_mut(&self, index: usize) -> &mut T {
        &mut *self.0[index].get()
    }
}

But note that we need a mutable reference.

Your new method should probably be taking &'a mut [T]. As currently written, it can promote a temporary slice to an internal &'static ... accidentally.

1 Like

Without the new generic lifetime you allowed dangling pointers:

1 Like

Yes, we noticed and fixed that—thanks!

Actually, inspired by this post from @alice we found a cleaner solution which is slightly less ergonomic (has set/get like Cell) but has a single unsafe (that implementing Sync): webgraph-rs/src/utils/sync_slice.rs at main · vigna/webgraph-rs · GitHub.

I was surprised that the get/set elemet methods are not marked unsafe, since they cause UB if used incorrectly. In the prior version they were marked unsafe, and using Cell doesn't make them safe, right?

Cell is entirely safe in a sequential environment. Contrarily to UnsafeCell, you have no way to obtain two exclusive references to the content (you can obtain two pointers, but that's safe as you'd need unsafe to dereference them).

The unsafe part is the fact that we make it Sync, whereas the standard library makes it !Sync. So undefined behavior can only be caused by data races.

Or, at least, that's what I understood.:point_up:t2:

This is my understanding and how I have implemented unsafe abstractions: For the get/set to be safe, something must ensure each Cell is used by exactly one thread, to prevent data races (UB) as you said. So it is unsafe to call get/set directly without that protection. The way this is normally expressed is by marking get/set unsafe to require some sort of wrapper that guarantees/enforces this protection, and exposes a safe interface to the cells. The goal is to make it impossible to cause UB using safe code.

1 Like

Is the link broken?

Yes. Thinking back, now the new code looks exactly like @alice 's original code, but using Cell instead of SafeCell (and implementing variants of the other Cell method).

So using Cell is hiding the fact that you might cause UB if you have a data race, as this explanation is on the documentation of the structure, rather than on the write method (as it is declared safe).

This has some sense because some of the methods cannot cause UB in isolation (i.e., get). It is the whole thing that, used improperly, can cause UB. I guess it is at least partially a matter of taste wether to mark all methods as unsafe or none.

From the compiler perspective, underlying to this there is a slice of UnsafeCell, so it won't do optimizations that it shouldn't do. Making everything unsafe would also make using this structure very cumbersome.

One intermediate possibility would be making the from_mut method and as_sync_slice method of the extension trait unsafe. That would mark that something potentially UB is happening, without making the whole interface to the thing too cumbersome.