Announcing shared_cell, An Additional Cell API With Zero-Cost Concurrent Data Sharing In Single-Threaded Asynchronous Code

https://crates.io/crates/shared_cell
https://github.com/AldaronLau/shared_cell
https://docs.rs/shared_cell/0.1.1/shared_cell/

The shared_cell crate is my newest project, in trying to discover an API for Cell that doesn't rely so heavily on Copy/Clone. My original attempt at making a similar API, requires a macro polyfill for const expressions, and that crate is called pure_cell, originally trying to be an alternative in the same space as ghost-cell (not my crate). What I'm trying to accomplish now is a bit different, though - to be able to have multiple asynchronous tasks running together on the same thread, and sharing data. This isn't a new problem-space for me, and I had solved the "inverse" of the problem with 100% safe code in pasts, by separating the data mutation into synchronous functions that are called upon receiving asynchronous notifications. But, shared_cell allows mutating the data from within (a synchronous context within) the asynchronous context, which makes some new things possible. It does require one line of unsafe, which I believe to have proven the soundness of to the best of my ability.

Ok, so how does it work? The API is inspired by LocalKey::with() in the standard library, with the additional constraint that the closure is Send + Sync (bounds which are crucial to proving the API's safety). Except, instead of getting a shared reference in the closure, you get a unique (&mut) reference, the lifetime of which can't escape the closure. So, now by having a shared reference on multiple tasks, you can mutably modify the data without copying it from each task, and without "locking" / runtime-borrow-checking an Rc<RefCell<T>> or similar (which would be extra susceptible to deadlocks).

One thing this crate enables is "Concurrent Actors". An actor that can process multiple requests at the same time, each modifying the actor's owned data without locking, using the shared_cell::TaskGroup API very loosely inspired by tokio::task::LocalSet, except lower-level and more flexible (doesn't require 'static lifetime, and not tied to a specific async runtime). The example in the shared_cell crate TaskGroup documentation only has one possible command to send to the actor, but it can be switched out for an enum.

What do you think?

I don't think the closure being Send + Sync is enough to prevent it from accessing the Cell reentrantly. For example I think you can perform such reentrant access by using a thread local, since the closure will still have access to it while being Send + Sync. Other examples involve crates like fragile and send-cell, which allow you to make non-Send values Send by allowing to access them only in the thread they were created in (effectively runtime checking sendness), again breaking the assumption that the closure can't capture a reference to the cell.

3 Likes
// cited from: https://docs.rs/shared_cell/0.1.1/src/shared_cell/cell.rs.html

#[extend(Cell<T>)]
pub trait CellExt<T> {
    /// Acquires a mutable reference to the cell's interior value.
    fn with<R>(&self, f: impl FnOnce(&mut T) -> R + Send + Sync) -> R {
        unsafe { f(&mut *self.as_ptr()) }
    }
}

If only it was this simple, Cell would’ve had such extra API for ages :smile:


I love building concrete unsoundness exploitations though, anyways, let me work on one quickly… :wink:

Edit: Very straightforward indeed with thread locals, as @SkiFire13 already suspected:

use shared_cell::CellExt;
use std::cell::Cell;

fn main() {
    thread_local! {
        static CELL: Cell<String> = Cell::new(String::from("Hello World"));
    }
    CELL.with(|c| c.with(|c1: &mut String| {
        CELL.with(|c| c.with(|c2: &mut String| {
            let hello: &str = c1;
            println!("now it exists: {hello}");
            drop(std::mem::take(c2));
            println!("now it's gone: {hello}");
        }))
    }));
}
now it exists: Hello World
now it's gone: C�xX�
5 Likes

I thought I might have missed something when trying to make the proof. Having other people look at it always helps! Now, I'm trying to think how an additional unsoundness exploit with the fragile or send-cell crates might work, but I think it wouldn't because of Sync (I'm probably missing something). I'm also thinking pure_cell isn't affected by this, but let me know if I'm wrong there. I'll keep thinking on this.

Sure, let see if I can make a demo for that, too.

Edit: Well… that took about 60s to write:

use fragile::Fragile;
use shared_cell::CellExt;
use std::cell::Cell;

fn main() {
    let cell = Fragile::new(Cell::new(String::from("Hello World")));
    cell.get().with(|c1: &mut String| {
        cell.get().with(|c2: &mut String| {
            let hello: &str = c1;
            println!("now it exists: {hello}");
            drop(std::mem::take(c2));
            println!("now it's gone: {hello}");
        })
    });
}
now it exists: Hello World
now it's gone: ��@a`�
1 Like

Just figured it out, but you beat me to it. I realize now it's because Fragile is Sync (didn't see that at first glance). I don't think it's possible to exploit with SendCell, though, since it's not sync.

While we’re at it… For good measure, I’ve now soundness-reviewed the fragile crate a little bit, too, and found some issues :sweat_smile:

4 Likes

Alright… pure_cell. It’s quite a bit harder to break, but possible. In general, I would not recommend relying to much on any properties of const. The purpose of const is only allowing compile-time evaluation; no guarantees of purity are actually made if a const fn is executed at run-time. And the capabilities of const will only grow in the future – it’s essentially impossible to reason about negatively (i.e. “if this code runs in const fn then it does not do XYZ kind of thing”) if you care about forward compatibility with future Rust versions.

Enabling the nightly-only ability to use mutable references in const, together with the (exciting! :sweat_smile:) property that panic! in const fn behaves different between compile-time and run-time, the latter being able to execute any custom panic hook code, is enough to break pure_cell:

#![feature(const_mut_refs)]

use pure_cell::{pure_cell, PureCell};
use std::{
    cell::OnceCell,
    mem::ManuallyDrop,
    panic::{catch_unwind, set_hook, AssertUnwindSafe},
};

fn main() {
    thread_local! {
        static CELL: PureCell<ManuallyDrop<String>> = PureCell::new(ManuallyDrop::new(String::from("Hello World")));
        static STRING: OnceCell<String> = OnceCell::new();
    }
    CELL.with(|c| {
        let s = Box::leak(Box::new(ManuallyDrop::new(String::new())));
        set_hook(Box::new(|_| {
            let replacement = ManuallyDrop::new(String::new());
            CELL.with(|c| {
                let s = pure_cell!(c, replacement, |v: ManuallyDrop<String>, r: ManuallyDrop<String>| -> ManuallyDrop<String>{
                    let old = v;
                    v = r;
                    old
                });
                let _ = STRING.with(|string| string.set(ManuallyDrop::into_inner(s)));
            });
        }));
        let _ = catch_unwind(AssertUnwindSafe(|| {
            let s = &mut *s;
            #[allow(unreachable_code)]
            {
                pure_cell!(
                    c,
                    s,
                    |v: ManuallyDrop<String>, s: &mut ManuallyDrop<String>| {
                        *s = v;
                        panic!();
                    }
                );
            }
        }));
        STRING.with(|string| {
            let hello = string.get().unwrap();
            println!("now it exists: {hello}");
            drop(ManuallyDrop::into_inner(std::mem::take(s)));
            println!("now it's gone: {hello}");
        });
    });
}
now it exists: Hello World
now it's gone: P�▒��U^4V
free(): double free detected in tcache 2
[1]    1698836 IOT instruction (core dumped)

Admitted, I wouldn’t immediately worry about anyone accidentally running into such soundness issues with pure_cell … but as I mentioned, with future additions to const fn capabilities it might very well become a lot easier to break at some point in the future.

5 Likes

On second thought, maybe the main issue here is simply that the thing isn’t really panic safe. Here, this works without any custom panic hooks or thread-locals:

#![feature(const_mut_refs)]

use pure_cell::{pure_cell, PureCell};
use std::{
    mem::ManuallyDrop,
    panic::{catch_unwind, AssertUnwindSafe},
};

fn main() {
    let c = PureCell::new(ManuallyDrop::new(String::from("Hello World")));
    let s1 = Box::leak(Box::new(ManuallyDrop::new(String::new())));
    let _ = catch_unwind(AssertUnwindSafe(|| {
        let s = &mut *s1;
        #[allow(unreachable_code)]
        {
            pure_cell!(
                c,
                s,
                |v: ManuallyDrop<String>, s: &mut ManuallyDrop<String>| {
                    *s = v;
                    panic!();
                }
            );
        }
    }));
    let s2 = Box::leak(Box::new(ManuallyDrop::new(String::new())));
    let _ = catch_unwind(AssertUnwindSafe(|| {
        let s = &mut *s2;
        #[allow(unreachable_code)]
        {
            pure_cell!(
                c,
                s,
                |v: ManuallyDrop<String>, s: &mut ManuallyDrop<String>| {
                    *s = v;
                    panic!();
                }
            );
        }
    }));
    let hello: &str = s1;
    println!("now it exists: {hello}");
    drop(ManuallyDrop::into_inner(std::mem::take(s2)));
    println!("now it's gone: {hello}");
}
thread 'main' panicked at src/main.rs:21:21:
explicit panic
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at src/main.rs:36:21:
explicit panic
now it exists: Hello World
now it's gone: �mЇ�U�fu

I kind of figured that const wouldn't be reliable enough for "pure" functions, but it was an interesting experiment. Do you mind if I use your examples and cite you in the README for the way to break it, before archiving pure_cell?

I'm still working on brainstorming the right fix for shared_cell.

I think I figured out how to fix the exploit (by providing a completely different API :smile:).

use core::{cell::UnsafeCell, pin::Pin};

/// Construct a `SharedCell<'_, T>`, by pinning a `value: T` locally.
#[macro_export]
macro_rules! shared_cell {
    ($value:ident $(,)?) => {
        let $value = core::pin::pin!($value);
        let $value = unsafe { $crate::SharedCell::new($value) };
    };
}

/// Shared cell type
pub struct SharedCell<'a, T: ?Sized>(&'a mut UnsafeCell<T>);

impl<'a, T> SharedCell<'a, T>
where
    T: Unpin,
{
    /// Create a new [`SharedCell`]
    ///
    /// # Safety
    ///
    /// `SharedCell::new()` must be passed a locally pinned value.
    pub unsafe fn new(value: Pin<&'a mut T>) -> Self {
        // https://github.com/rust-lang/rust/issues/111645
        let value: *mut T = value.get_mut();
        let value: *mut UnsafeCell<T> = value.cast();
        // SAFETY: `UnsafeCell<T>` has the same memory layout as `T` due to
        // #[repr(transparent)].
        let value = &mut *value;

        Self(value)
    }

    /// Acquires a mutable reference to the cell's interior value.
    pub fn with<R>(&self, f: impl FnOnce(&mut T) -> R + 'static) -> R {
        // SAFETY: Locally-pinned values are guaranteed to be not `'static`, by
        // requiring `'static` we prevent reëntrant calls to `with()`.
        unsafe { f(&mut *self.0.get()) }
    }
}

Actually, I don't think the "pinning" needs to be as strict as this, so the Unpin requirement should be able to go away.

I have fixed the soundness issues in the crate to the best of my knowledge, and released a new version of the crate without sacrificing the original goals (noting that there were some major API changes).

https://docs.rs/shared_cell/0.2.0/shared_cell/index.html

https://github.com/AldaronLau/shared_cell/blob/stable/CHANGELOG.md#020---2023-08-12

Interesting idea. Though I think this might interact badly with the popular crate scoped_tls - Rust. I'll test my assumption shortly.


Edit: Actually, that combination specifically doesn’t seem to break it, as you cannot put the SharedCell<'a, T> in there without keeping its lifetime visible.


Edit2: Looks like a generalization for non-&T types actually exists. I’ll explore exploitations once again.


Edit3: With scoped-tls-hkt, it can be broken. Such beautiful nesting though… :see_no_evil:

use scoped_tls_hkt::scoped_thread_local;
use shared_cell::{shared_cell, SharedCell};

fn main() {
    scoped_thread_local! (static CELL: for<'a> &'a SharedCell<'a, String>);
    scoped_thread_local! (static mut MUT_STRING: String);

    let cell = String::from("Hello World");
    shared_cell!(cell);

    CELL.set(&cell, || {
        cell.with(|c1: &mut String| {
            CELL.with(|c| {
                MUT_STRING.set(c1, || {
                    c.with(|c2: &mut String| {
                        MUT_STRING.with(|c1| {
                            let hello: &str = c1;
                            println!("now it exists: {hello}");
                            drop(std::mem::take(c2));
                            println!("now it's gone: {hello}");
                        })
                    })
                })
            })
        })
    });
}
now it exists: Hello World
now it's gone: }�saߖ�

Edit4: While this particular usage relies on the SharedCell’s lifetime parameter being covariant, I think even an invariant lifetime could be worked around using a trait object.

Edit5: Indeed, my assumption is correct.

Click to show more…

Even with a small patch to make the lifetime invariant

diff --git a/src/shared_cell.rs b/src/shared_cell.rs
index 66bc0d7..fdbcfe2 100644
--- a/src/shared_cell.rs
+++ b/src/shared_cell.rs
@@ -1,6 +1,6 @@
 use core::{
     cell::UnsafeCell,
-    fmt::{Debug, Formatter, Result},
+    fmt::{Debug, Formatter, Result}, marker::PhantomData,
 };
 
 /// Construct a `SharedCell<'_, T>`, by locally constructing a `value: T`.
@@ -25,7 +25,7 @@ macro_rules! shared_cell {
 /// ```
 #[doc = include_str!("../examples/pasts.rs")]
 /// ```
-pub struct SharedCell<'a, T: ?Sized>(&'a mut UnsafeCell<T>);
+pub struct SharedCell<'a, T: ?Sized>(&'a mut UnsafeCell<T>, PhantomData<fn(&'a ()) -> &'a ()>);
 
 impl<T> Debug for SharedCell<'_, T>
 where
@@ -54,7 +54,7 @@ impl<'a, T> SharedCell<'a, T> {
         // `#[repr(transparent)]`.
         let value = &mut *value;
 
-        Self(value)
+        Self(value, PhantomData)
     }
 
     /// Acquires a mutable reference to the cell's interior value.

the following code, hiding the invariant lifetime in a trait object, still runs “successfully”

use scoped_tls_hkt::scoped_thread_local;
use shared_cell::{shared_cell, SharedCell};

trait SomeSharedCell<T> {
    fn with(&self, f: fn(&mut T));
}

impl<'a, T: 'static> SomeSharedCell<T> for SharedCell<'a, T> {
    fn with(&self, f: fn(&mut T)) {
        self.with(f)
    }
}

fn main() {
    scoped_thread_local! (static CELL: for<'a> &'a dyn SomeSharedCell<String>);
    scoped_thread_local! (static mut MUT_STRING: String);

    let cell = String::from("Hello World");
    shared_cell!(cell);

    CELL.set(cell, || {
        cell.with(|c1: &mut String| {
            CELL.with(|c| {
                MUT_STRING.set(c1, || {
                    c.with(|c2: &mut String| {
                        MUT_STRING.with(|c1| {
                            let hello: &str = c1;
                            println!("now it exists: {hello}");
                            drop(std::mem::take(c2));
                            println!("now it's gone: {hello}");
                        })
                    })
                })
            })
        })
    });
}
2 Likes

For good measure, I’ve also found and reported a soundness issue in scoped-tls-hkt (this issue is not relevant for the demonstration above on shared_cell 0.2):

1 Like

It has been

0

days since someone tried and failed to use unsafe to circumvent the lifetime system.

25 Likes

Oh no :sweat_smile:

Thank you so much for reviewing, I feel like I'm learning a lot about edge cases I never would have thought existed.

Trying to think how to fix it, my current thought is having with() take ShareCell by value, and make copying the cell unsafe (changing the inner &mut to & since copying that would be unsound). Since, I no longer think that there is a way to prevent the reëntrant case at compile time without taking ownership of the cell during the call to with(). Then, it should be sound for TaskGroup::spawn() to clone the SharedCell and pass it to the closure which returns a local future. Unless, you prove me wrong again :smile:

Hard to get a full picture from the description in words alone. I’m happy to look at some concrete type signatures and/or macros though ^^

I tried to write up some code, but I realized a fundamental issue with all the possible solutions I've come up with so far.

By using pasts::Executor recursive block_on(), there is no way to prevent reëntry at compile time, because any closure can gain access to the executor, which could yield from within a call to with().

If I want to prevent it at compile time, then I would need to also control the async runtime.

I tried to implement the API I described above while also avoiding the fundamental issue with my original idea on how to implement it.

https://github.com/AldaronLau/shared_cell/tree/da404c1c92e910a084847d298e27add46f80d019

Thanks for sharing. I’m not sure yet whether I’ll be able to take a look today, or only tomorrow.