Announcing shared_cell, An Additional Cell API With Zero-Cost Concurrent Data Sharing In Single-Threaded Asynchronous Code

Just figured it out, but you beat me to it. I realize now it's because Fragile is Sync (didn't see that at first glance). I don't think it's possible to exploit with SendCell, though, since it's not sync.

While we’re at it… For good measure, I’ve now soundness-reviewed the fragile crate a little bit, too, and found some issues :sweat_smile:

4 Likes

Alright… pure_cell. It’s quite a bit harder to break, but possible. In general, I would not recommend relying to much on any properties of const. The purpose of const is only allowing compile-time evaluation; no guarantees of purity are actually made if a const fn is executed at run-time. And the capabilities of const will only grow in the future – it’s essentially impossible to reason about negatively (i.e. “if this code runs in const fn then it does not do XYZ kind of thing”) if you care about forward compatibility with future Rust versions.

Enabling the nightly-only ability to use mutable references in const, together with the (exciting! :sweat_smile:) property that panic! in const fn behaves different between compile-time and run-time, the latter being able to execute any custom panic hook code, is enough to break pure_cell:

#![feature(const_mut_refs)]

use pure_cell::{pure_cell, PureCell};
use std::{
    cell::OnceCell,
    mem::ManuallyDrop,
    panic::{catch_unwind, set_hook, AssertUnwindSafe},
};

fn main() {
    thread_local! {
        static CELL: PureCell<ManuallyDrop<String>> = PureCell::new(ManuallyDrop::new(String::from("Hello World")));
        static STRING: OnceCell<String> = OnceCell::new();
    }
    CELL.with(|c| {
        let s = Box::leak(Box::new(ManuallyDrop::new(String::new())));
        set_hook(Box::new(|_| {
            let replacement = ManuallyDrop::new(String::new());
            CELL.with(|c| {
                let s = pure_cell!(c, replacement, |v: ManuallyDrop<String>, r: ManuallyDrop<String>| -> ManuallyDrop<String>{
                    let old = v;
                    v = r;
                    old
                });
                let _ = STRING.with(|string| string.set(ManuallyDrop::into_inner(s)));
            });
        }));
        let _ = catch_unwind(AssertUnwindSafe(|| {
            let s = &mut *s;
            #[allow(unreachable_code)]
            {
                pure_cell!(
                    c,
                    s,
                    |v: ManuallyDrop<String>, s: &mut ManuallyDrop<String>| {
                        *s = v;
                        panic!();
                    }
                );
            }
        }));
        STRING.with(|string| {
            let hello = string.get().unwrap();
            println!("now it exists: {hello}");
            drop(ManuallyDrop::into_inner(std::mem::take(s)));
            println!("now it's gone: {hello}");
        });
    });
}
now it exists: Hello World
now it's gone: P�▒��U^4V
free(): double free detected in tcache 2
[1]    1698836 IOT instruction (core dumped)

Admitted, I wouldn’t immediately worry about anyone accidentally running into such soundness issues with pure_cell … but as I mentioned, with future additions to const fn capabilities it might very well become a lot easier to break at some point in the future.

5 Likes

On second thought, maybe the main issue here is simply that the thing isn’t really panic safe. Here, this works without any custom panic hooks or thread-locals:

#![feature(const_mut_refs)]

use pure_cell::{pure_cell, PureCell};
use std::{
    mem::ManuallyDrop,
    panic::{catch_unwind, AssertUnwindSafe},
};

fn main() {
    let c = PureCell::new(ManuallyDrop::new(String::from("Hello World")));
    let s1 = Box::leak(Box::new(ManuallyDrop::new(String::new())));
    let _ = catch_unwind(AssertUnwindSafe(|| {
        let s = &mut *s1;
        #[allow(unreachable_code)]
        {
            pure_cell!(
                c,
                s,
                |v: ManuallyDrop<String>, s: &mut ManuallyDrop<String>| {
                    *s = v;
                    panic!();
                }
            );
        }
    }));
    let s2 = Box::leak(Box::new(ManuallyDrop::new(String::new())));
    let _ = catch_unwind(AssertUnwindSafe(|| {
        let s = &mut *s2;
        #[allow(unreachable_code)]
        {
            pure_cell!(
                c,
                s,
                |v: ManuallyDrop<String>, s: &mut ManuallyDrop<String>| {
                    *s = v;
                    panic!();
                }
            );
        }
    }));
    let hello: &str = s1;
    println!("now it exists: {hello}");
    drop(ManuallyDrop::into_inner(std::mem::take(s2)));
    println!("now it's gone: {hello}");
}
thread 'main' panicked at src/main.rs:21:21:
explicit panic
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at src/main.rs:36:21:
explicit panic
now it exists: Hello World
now it's gone: �mЇ�U�fu

I kind of figured that const wouldn't be reliable enough for "pure" functions, but it was an interesting experiment. Do you mind if I use your examples and cite you in the README for the way to break it, before archiving pure_cell?

I'm still working on brainstorming the right fix for shared_cell.

I think I figured out how to fix the exploit (by providing a completely different API :smile:).

use core::{cell::UnsafeCell, pin::Pin};

/// Construct a `SharedCell<'_, T>`, by pinning a `value: T` locally.
#[macro_export]
macro_rules! shared_cell {
    ($value:ident $(,)?) => {
        let $value = core::pin::pin!($value);
        let $value = unsafe { $crate::SharedCell::new($value) };
    };
}

/// Shared cell type
pub struct SharedCell<'a, T: ?Sized>(&'a mut UnsafeCell<T>);

impl<'a, T> SharedCell<'a, T>
where
    T: Unpin,
{
    /// Create a new [`SharedCell`]
    ///
    /// # Safety
    ///
    /// `SharedCell::new()` must be passed a locally pinned value.
    pub unsafe fn new(value: Pin<&'a mut T>) -> Self {
        // https://github.com/rust-lang/rust/issues/111645
        let value: *mut T = value.get_mut();
        let value: *mut UnsafeCell<T> = value.cast();
        // SAFETY: `UnsafeCell<T>` has the same memory layout as `T` due to
        // #[repr(transparent)].
        let value = &mut *value;

        Self(value)
    }

    /// Acquires a mutable reference to the cell's interior value.
    pub fn with<R>(&self, f: impl FnOnce(&mut T) -> R + 'static) -> R {
        // SAFETY: Locally-pinned values are guaranteed to be not `'static`, by
        // requiring `'static` we prevent reëntrant calls to `with()`.
        unsafe { f(&mut *self.0.get()) }
    }
}

Actually, I don't think the "pinning" needs to be as strict as this, so the Unpin requirement should be able to go away.

I have fixed the soundness issues in the crate to the best of my knowledge, and released a new version of the crate without sacrificing the original goals (noting that there were some major API changes).

https://docs.rs/shared_cell/0.2.0/shared_cell/index.html

https://github.com/AldaronLau/shared_cell/blob/stable/CHANGELOG.md#020---2023-08-12

Interesting idea. Though I think this might interact badly with the popular crate scoped_tls - Rust. I'll test my assumption shortly.


Edit: Actually, that combination specifically doesn’t seem to break it, as you cannot put the SharedCell<'a, T> in there without keeping its lifetime visible.


Edit2: Looks like a generalization for non-&T types actually exists. I’ll explore exploitations once again.


Edit3: With scoped-tls-hkt, it can be broken. Such beautiful nesting though… :see_no_evil:

use scoped_tls_hkt::scoped_thread_local;
use shared_cell::{shared_cell, SharedCell};

fn main() {
    scoped_thread_local! (static CELL: for<'a> &'a SharedCell<'a, String>);
    scoped_thread_local! (static mut MUT_STRING: String);

    let cell = String::from("Hello World");
    shared_cell!(cell);

    CELL.set(&cell, || {
        cell.with(|c1: &mut String| {
            CELL.with(|c| {
                MUT_STRING.set(c1, || {
                    c.with(|c2: &mut String| {
                        MUT_STRING.with(|c1| {
                            let hello: &str = c1;
                            println!("now it exists: {hello}");
                            drop(std::mem::take(c2));
                            println!("now it's gone: {hello}");
                        })
                    })
                })
            })
        })
    });
}
now it exists: Hello World
now it's gone: }�saߖ�

Edit4: While this particular usage relies on the SharedCell’s lifetime parameter being covariant, I think even an invariant lifetime could be worked around using a trait object.

Edit5: Indeed, my assumption is correct.

Click to show more…

Even with a small patch to make the lifetime invariant

diff --git a/src/shared_cell.rs b/src/shared_cell.rs
index 66bc0d7..fdbcfe2 100644
--- a/src/shared_cell.rs
+++ b/src/shared_cell.rs
@@ -1,6 +1,6 @@
 use core::{
     cell::UnsafeCell,
-    fmt::{Debug, Formatter, Result},
+    fmt::{Debug, Formatter, Result}, marker::PhantomData,
 };
 
 /// Construct a `SharedCell<'_, T>`, by locally constructing a `value: T`.
@@ -25,7 +25,7 @@ macro_rules! shared_cell {
 /// ```
 #[doc = include_str!("../examples/pasts.rs")]
 /// ```
-pub struct SharedCell<'a, T: ?Sized>(&'a mut UnsafeCell<T>);
+pub struct SharedCell<'a, T: ?Sized>(&'a mut UnsafeCell<T>, PhantomData<fn(&'a ()) -> &'a ()>);
 
 impl<T> Debug for SharedCell<'_, T>
 where
@@ -54,7 +54,7 @@ impl<'a, T> SharedCell<'a, T> {
         // `#[repr(transparent)]`.
         let value = &mut *value;
 
-        Self(value)
+        Self(value, PhantomData)
     }
 
     /// Acquires a mutable reference to the cell's interior value.

the following code, hiding the invariant lifetime in a trait object, still runs “successfully”

use scoped_tls_hkt::scoped_thread_local;
use shared_cell::{shared_cell, SharedCell};

trait SomeSharedCell<T> {
    fn with(&self, f: fn(&mut T));
}

impl<'a, T: 'static> SomeSharedCell<T> for SharedCell<'a, T> {
    fn with(&self, f: fn(&mut T)) {
        self.with(f)
    }
}

fn main() {
    scoped_thread_local! (static CELL: for<'a> &'a dyn SomeSharedCell<String>);
    scoped_thread_local! (static mut MUT_STRING: String);

    let cell = String::from("Hello World");
    shared_cell!(cell);

    CELL.set(cell, || {
        cell.with(|c1: &mut String| {
            CELL.with(|c| {
                MUT_STRING.set(c1, || {
                    c.with(|c2: &mut String| {
                        MUT_STRING.with(|c1| {
                            let hello: &str = c1;
                            println!("now it exists: {hello}");
                            drop(std::mem::take(c2));
                            println!("now it's gone: {hello}");
                        })
                    })
                })
            })
        })
    });
}
2 Likes

For good measure, I’ve also found and reported a soundness issue in scoped-tls-hkt (this issue is not relevant for the demonstration above on shared_cell 0.2):

1 Like

It has been

0

days since someone tried and failed to use unsafe to circumvent the lifetime system.

25 Likes

Oh no :sweat_smile:

Thank you so much for reviewing, I feel like I'm learning a lot about edge cases I never would have thought existed.

Trying to think how to fix it, my current thought is having with() take ShareCell by value, and make copying the cell unsafe (changing the inner &mut to & since copying that would be unsound). Since, I no longer think that there is a way to prevent the reëntrant case at compile time without taking ownership of the cell during the call to with(). Then, it should be sound for TaskGroup::spawn() to clone the SharedCell and pass it to the closure which returns a local future. Unless, you prove me wrong again :smile:

Hard to get a full picture from the description in words alone. I’m happy to look at some concrete type signatures and/or macros though ^^

I tried to write up some code, but I realized a fundamental issue with all the possible solutions I've come up with so far.

By using pasts::Executor recursive block_on(), there is no way to prevent reëntry at compile time, because any closure can gain access to the executor, which could yield from within a call to with().

If I want to prevent it at compile time, then I would need to also control the async runtime.

I tried to implement the API I described above while also avoiding the fundamental issue with my original idea on how to implement it.

https://github.com/AldaronLau/shared_cell/tree/da404c1c92e910a084847d298e27add46f80d019

Thanks for sharing. I’m not sure yet whether I’ll be able to take a look today, or only tomorrow.

I think I just found another soundness issue

1 Like

Just released a new version of shared_cell. I have some (possibly false) confidence that I have fixed all the soundness holes. It is a lot more restrictive now, but still accomplishes the original goal.

https://crates.io/crates/shared_cell/0.4.0
https://github.com/AldaronLau/shared_cell/tree/v0.4.0
https://docs.rs/shared_cell/0.4.0/shared_cell/index.html (link may not work yet)

A couple of observations about the documentation:

  • You have a non-compiling example commented out in the middle of a compiling one. It would be better to add a separate, explicitly compile_fail example, in which case rustdoc could ensure that it indeed fails to compile as it should. This would result in higher confidence in the soundness of the API.
  • The description of with() and into_inner() is basically the same, unless I'm oblivious of some fundamental difference between "get" and "acquire". It would be better to clarify that with() hands out a temporary reference only.
  • You have a typo, "refernce", in the docs for into_inner().
  • Does into_inner() need to be unsafe? The duplication of a mutating structure already sounds pretty unsafe by itself, so it'd perhaps be better to add the constraint of not dereferencing the cell mutably to the list of safety requirements of duplicate(), and make into_inner() safe.
1 Like

Thank you for the feedback!

I based with() documentation off of https://doc.rust-lang.org/std/thread/struct.LocalKey.html#method.with documentation, but I didn't realize I had made into_inner() docs so similar. It looks like within the std library, "Returns" is used for returning a reference rather than "Get", but "Acquires" wording on that API doesn't include anything about the reference being temporary (presumably because it can be inferred).

I suppose there is no reason to keep into_inner() unsafe anymore, so long as spawn!() calls the async fn with a mutable reference (which it does now).

1 Like

Thanks again for the useful feedback, I just released a new version with the suggested changes.

https://crates.io/crates/shared_cell/0.5.0
https://github.com/AldaronLau/shared_cell/tree/v0.5.0
https://docs.rs/shared_cell/0.5.0/shared_cell/index.html (link may not work yet)