Unsafety with advanced version of RefCell

For reasons I have outlined at reasonable length in Mapping `Ref` to a value containing a reference · Issue #54776 · rust-lang/rust · GitHub as well as the README (cell/README.md at master · d-e-s-o/cell · GitHub), I've created a crate that "enhances" std::cell::RefCell with a map_val method that can be used to map actual value objects. It does so by providing a replacement for RefCell, for lack of better alternatives. It should be clearer reading through the issue/readme, but in a nutshell we can now implement something like:

struct RefStrings(RefCell<Vec<String>>);

impl RefStrings {
    fn iter<'t, 's: 't>(&'s self) -> RefVal<'t, Iter<String>> {
        Ref::map_val(self.0.borrow(), |x| x.iter())
    }
}

Because we map to an iterator, which is an actual value (with an embedded reference), this is not possible (from what I can tell) out of the box.

Now, unfortunately, this is seemingly memory unsafe, as @stepancheg showed in issue #5 (Ref::map_val allows pointer leak · Issue #5 · d-e-s-o/cell · GitHub), the reasons of which presumably lie in violation of some invariant BorrowRef, which is something I use (I don't make use of unsafe directly).

I have tried to understand the exact details of the problem but I can't piece it together. I am writing here in the hope that somebody can clarify a two questions:

  1. Can somebody make sense of the problem? :slight_smile:
  2. Where does the observed difference (https://github.com/d-e-s-o/cell/issues/5#issuecomment-493714437) in behavior between mut and non-mut versions come from?

A trimmed down test case reproducing the problem is available here https://github.com/d-e-s-o/cell/files/3293614/cell-issue-5.tar.gz.

2 Likes

If you don't put links in backticks we can actually click them :slight_smile:

Calling it RefCell is confusing.

    let x: RefValMut<&String> = RefMut::map_val(b, |s| s as &String);
    x.deref()

The closure needs to be called at deref so it would use life from borrowing x. (Easier said than done; Can't use Deref trait to have Target inherit lifetime from function.)

Example of problematic code:

use ::cell::{
    Ref, RefCell, // implementation copied from `::core`, with an added `map_val`
    RefVal,
};


/// This should not pass borrowck
fn main ()
{
    let s = RefCell::new(String::new());
    let y: &String = {
        let x: Ref<String> = RefCell::borrow(&s);
        let x: RefVal<&String> = Ref::map_val(
            x,
            |s: &String| -> &String {
                s
            },
        );
        *RefVal::deref(&x)
    };

    // assert not borrowed
    RefCell::borrow_mut(&s);

    // but pointer to data is still accessible
    y.len();
}

Ok, this one was an interesting mental exercise to test understanding lifetimes.

The core problem, as @jonh said, lies in the Deref function.

To make it even clearer, let's just write it as an associated function:

// 'a stands for the lifetime of the borrow of the RefCell
// > lifetime 'b' of the `RefVal` instance

struct RefVal<'a, T : 'a> {
    value: T,
    borrow: ValidFor<'a>,
}

impl<'a, T : 'a> RefVal<'a, T> {
    fn deref<'sub_b> (orig: &'sub_b RefVal<'a, T>) -> &'sub_b T
    {
        &orig.value
    }
}

This, at first glance, does not look bad: we have a value T valid for the lifetime 'b of the initial borrow on the RefCell, so if we try to borrow with a smaller lifetime 'sub_b, we should have a valid borrow of T of lifetime 'sub_b.

This looks fine, except for the fact that lifetimes on references with generics type may not be as upper-bounded as we may think; one of the main reasons for that is that shared references (i.e. the type &'x T) are Copy:

fn max<'small, 'big : 'small> (x: &'small &'big String) -> &'big String
{
    return *x; // because `&'big String` is `Copy`;
    // or, equivalently
    return <&'big String as Clone>::clone(x);
}

// vs.
fn min<'small, 'big : 'small> (x: &'small &'big String) -> &'small String
{
    &**x; // reborrow of Copy
}

And this means that in our case, T = &'a String, where 'a is the lifetime of the first (compile-time) borrow of s: RefCell<String>. This borrow will actually last up until the final y.len() call, whereas the lifetime 'b of the runtime-checked Ref/RefVal variable ends at the end of the scope:

fn main ()
{
    let s = RefCell::new(String::new());
    let y: &'a String = { //                                 'b 'a
        let x: Ref<'a, String> = RefCell::borrow(&'a s);  // -+--+
        let x: RefVal<&'a String> = Ref::map_val(         //  !  |
            x,                                            //  |  |
            |s: &'a String| -> &'a String {               //  |  |
                s                                         //  |  |
            },                                            //  |  |
        );                                                //  |  |
        *RefVal::deref(&'sub_b x) // <------------------------+  |
    };                                                    //     |
                                                          //     |
    // assert not borrowed                                // 'b2 |
    match RefCell::borrow_mut(&'c s) { _ => {             // -+  |
    }} // <---------------------------------------------------+  |
                                                          //     |
    // but pointer to data is still accessible            //     |
    y.len(); // <------------------------------------------------+
}
  • Recap on important lifetimes: 'a > 'b > 'sub_b.

The runtime checks ensure that 'b and 'b2 should be disjoint; as you can see, so are they, hence the lack of a problem. 'a and 'c may overlap, since they are just (compile-time) shared borrows of the RefCell.

So the issue is within the *RefVal::deref(&'sub_b x) line: there is a borrow &'sub_b x that cannot outlive x (it cannot outlive 'b), and yet the y binding ends up borrowing the String for the whole 'a lifetime.

This is possible thanks to the mechanics shown by max above: we end up with &'sub_b &'a String, and we need a &'a String, so a simple Copy with the * operator sufficies to reach 'a, thus crossing the expected maximum 'b !

The usual RefCell's Ref deref prevents this since it internally handles the double reference and uses the smaller one instead (i.e., it uses min before lending a reference, thus preventing the max case).

One soution to this, although I don't know how useful it may be, is to have the following Deref implementation instead:

impl<'b, T : Deref + 'b> Deref for RefVal<'b, T> {
    type Target = T::Target;

    // if, for instance, T = &'a T::Target,
    // give `&'_ T::Target` instead of, `&'_ &'a T::Target` (i.e. &'_ T)
    // to prevent people from getting a `&'a T::Target`
    #[inline]
    fn deref(self: &'_ RefVal<'b, T>) -> &'_ T::Target {
        &*self.value
    }
}

impl<'b, T : DerefMut + 'b> DerefMut for RefVal<'b, T> {
    #[inline]
    fn deref_mut (self: &'_ mut RefVal<'b, T>) -> &'_ mut T::Target
    {
        &mut *self.value
    }
}
6 Likes

And that's what I had. And then the forum software insisted that "new users can only embed two links in a post".

Thank you @Yandros for this extensive answer!

I guess what you describe may also explain why behavior changes when making certain bindings mut (it's no longer a shared reference). I am still wondering whether there is a way to exploit this fact to make the code behave correctly, but haven't spent much time on that so far.

1 Like

Thanks for your response @jonh . Why do you say that, and do you have a better suggestion?

Oh damn, I never remember that rule. It's somewhat stupid; after all, does it really matter how many spam links there are in a spam post...

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.