Self-borrowing struct and RefCell

zrk · September 17, 2018, 7:53pm

Hello Rustaceans ,

Let's say I have a type called Locker with the following definition:

struct Marker;
struct Locker<'auto> {
    marker : Marker,
    autoref : Option<&'auto Marker>, 
}

and the following methods:

impl<'auto> Locker<'auto> {
    pub fn new() -> Self {
        Self { marker : Marker, autoref : None }
    }
    
    pub fn lock(&'auto mut self) {
        self.autoref = Some(&self.marker);
    }
}

If I use this struct directly, I can call Locker::lock and it will compile:

fn main() {
    let mut locker : Locker::new();
    locker.lock()
}

I can even put this struct in another struct:

fn main() {
    struct A<'auto> {
        locker: Locker<'auto>,
    }
    
    impl<'auto> A<'auto> {
        fn borrow_mut(&mut self) -> &mut Locker<'auto> {
            &mut self.locker
        }
    }
    let mut locker = A { locker : Locker::new() };
    let locker_borrow = locker.borrow_mut();
    locker_borrow.lock();
}

But if I put that struct in a RefCell, then I cannot get the following to compile:

fn main() {
    use std::cell::RefCell;
    let locker = RefCell::new(Locker::new());
    let mut locker_borrow = locker.borrow_mut();
    locker_borrow.lock();
}

   Compiling playground v0.0.1 (file:///playground)
error[E0597]: `locker_borrow` does not live long enough
  --> src/main.rs:22:5
   |
22 |     locker_borrow.lock();
   |     ^^^^^^^^^^^^^ borrowed value does not live long enough
23 | }
   | - `locker_borrow` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

error: aborting due to previous error

You can try the example on the playground.

My questions are the following:

Is there a way to modify this example so that it compiles? I'd like to call Locker::lock on a a Locker that lives in a RefCell. I don't want to change the lifetime of the perma-borrow, though .
Is there a chance that this behavior changes in future versions of Rust, or is it by design? I already tried with the 2018 edition on the playground, which I believe includes NLL. The error message is phrased a little bit differently but I believe it means the same?

   Compiling playground v0.0.1 (/playground)
error[E0597]: `locker_borrow` does not live long enough
  --> src/main.rs:22:5
   |
22 |     locker_borrow.lock();
   |     ^^^^^^^^^^^^^ borrowed value does not live long enough
23 | }
   | -
   | |
   | `locker_borrow` dropped here while still borrowed
   | borrow later used here, when `locker_borrow` is dropped

To give a bit of context to my questions, I am working on a reference type that relies on a self-borrow.
If the answer to both my questions is negative, then that means that this type cannot be leaked through a reference cycle by putting it in a Rc. That would mean that my reference type is safe unless it is put in ManuallyDrop. Knowing this would allow me to proceed to the next part of my plan .

More generally, I'd find that it would be an interesting property of self-borrowing structs that they statically cannot be put in a cycle (if the method that triggers the self-borrow is called).

Thank you for reading this post !

skysch · September 17, 2018, 8:39pm

I don't think that what you want is possible or even desireable. RefCell doesn't allow references into its contents, because that is how it enforces its borrowing rules. RefCell panics if it is already borrowed, and if a reference escapes, then it cannot know if it is already borrowed.

that means that this type cannot be leaked through a reference cycle by putting it in a Rc . That would mean that my reference type is safe unless it is put in ManuallyDrop .

Rc cycles are not the only way to leak. In general, one would expect there to be arbitrary ways to leak data in a Turing complete language, because the language must allow simulating any language which allows leaks. (Though that may only be true in a technical sense... within a virtualizaion framework.) But even if you can't construct an Rc cycle, your reference isn't safe if you rely on destructor calls for safety.

Another problem I see is that you want to modify the autoref value while you have a shared reference to the containing type. So you'd need interior mutability inside the autoref field to do that. This is a problem with all self-borrows: you can't safely create a shared reference to a struct and then modify it to put that reference inside it because you're not allowed to mutate if a shared reference exists.

If you use an external RefCell, you can't get a reference to the interior. If you use an internal RefCell, you can't enforce anything statically using the borrow checker. If you use unsafe and no RefCell, you can't modify the value to lock it. I'm not sure what else could be tried. (In any case, the Marker struct is completely redundant as far as I can tell. You get the same results by using an &'auto Locker<'auto>.)

More generally, I’d find that it would be an interesting property of self-borrowing structs that they statically cannot be put in a cycle

It's worth bearing in mind that self-borrowing structs are already in a cycle, so I don't see how it would follow that they cannot be put into a cycle... unless you can't do anything with them at all.

zrk · September 18, 2018, 11:48am

Thank you for your answer!

Meaning, the answer to questions is "no", then? That's actually great!

I am aware that Rc is not the only way to leak (In the linked thread, I built an example using Box::leak()). I am trying to list all the ways one can leak the destructor in a way that is dangerous for safety. In particular, "leaking" as a consequence of a thread deadlock is not a problem, because referenced value cannot be accessed due to the deadlock anyway.

At the moment, I identified only three "ways" the value could leak:

Because of a "cycle". What I mean by cycle here is "a sequence of objects owning each other so that it will remain allocated even if all external owners of the cycle disappear". In my understanding, in safe rust, the only way to build such cycles is using (A)Rc. It appears that Locker types cannot be put in a Rc however (well, at least not if you want to call lock).
Because of a "pure leak", such as Box::leak(), mem::forget(), ManuallyDrop, union. In my understanding, in safe rust they all boil down to ManuallyDrop or union somewhere.
Because of the use of unsafe code. I'm not even sure of what kind of leak could be done using unsafe, but I guess it is possible. However I think if unsafe is used, the burden is on 0the user of unsafe to guarantee that the resulting code is memory safe.

I know that my reference type isn't safe, but I prefer to document its safety precisely as "Don't put it in ManuallyDrop" rather than "It's unsafe anyway, don't ever use that". But to do that, I need to know whether what I'm claiming is true.

I'm not sure I understand these paragraphs. I take it that I definitely cannot put a Locker inside of a RefCell? If so, great! That's what I want.

I'm not sure I understand this? I should have mentioned earlier what I meant, but I'm speaking of cycles in the context of shared ownership. AFAICT the self-borrowing struct has a non-owning reference to itself?

Thank you again for your answer

trentj · September 18, 2018, 1:02pm

I'm not sure that is a correct way to think about unsafe. If I write some code using unsafe, but I definitely uphold all of Rust's invariants and don't violate memory safety, my code is still "safe". I'm not expected to go through every crate on crates.io and ask "Does this type have an additional requirement that I might accidentally be breaking?" If I'm not breaking Rust's rules, my use of unsafe is correct.

In addition to other crates, the language or the standard library could change. Box::leak is an example. In Rust before 1.26, there was no way to leak a Box<T> and get a &'static T without using unsafe. So if I wrote code in Rust 1.25 that maintained memory safety by the assumption that &'static references can't be to the heap, I might have been right in saying "somebody else would have to use unsafe to violate my assumption", but I was still wrong in saying "therefore my code is safe". The assumption I made wasn't an intended guarantee; it was an accidental consequence of the standard library not being quite as expressive as it could have been.

So, I haven't given up on your approach entirely, but I don't think you can simply list all the ways it's currently possible to leak a value and rule them all out one by one; you have to argue from first principles, so to speak, that your API makes leaking impossible.

smarnach · September 18, 2018, 2:20pm

It's rather common for types to have additional invariants that need to be upheld by unsafe code, in addition to Rsut's general memory safety rules. As an example, the bytes buffer of a String must always be valid UTF-8. You can use unsafe code to get a &mut Vec<u8> pointing to the string buffer, but you are still responsible for upholding the UTF-8 requirement. I think it's perfectly reasonable for a type to document the invariants that unsafe code dealing with the type needs to uphold.

Of course there are limits to what you can reasonably ask your users to do before it gets silly, but I don't think there is a precisely defined line that you can't cross. I personally don't have an opinion on this specific case, since I'm not familiar with the details.

skysch · September 18, 2018, 2:43pm

I guess I don't really understand the broader context of what you're trying to do.

I am trying to list all the ways one can leak the destructor in a way that is dangerous for safety.

It sounds like you're using a specific definition of 'leak' or 'safety' that is tied to some API you're looking at? (Leaking isn't unsafe in Rust.) In general, there is no way to determine what a leak even is, let alone how many ways there are to do it. To some programs, sticking something into a Vec and never taking it out is a leak. One could think of any data structure as a special purpose memory allocator, and anything that owns data for any nonzero length of time could be leaking if looked at through the right lens.

I know that my reference type isn’t safe, but I prefer to document its safety precisely as "Don’t put it in ManuallyDrop " rather than “It’s unsafe anyway, don’t ever use that”. But to do that, I need to know whether what I’m claiming is true.

For my curiousity (and because I am confused...) if you have a generic struct with a type parameter T, why would it matter that I don't want you to drop T? I can see the opposite clearly: if you said you won't call drop on T, you can tell me not to give you anything that relies on drop calls for correctness. But if I have a T that doesn't care if drop is called or not, why would it bother you to take one in? If I give you a ManuallyDrop, how is that any more dangerous than if I give you an i32?

Disregard that... I misread this bit.

zrk · September 18, 2018, 7:41pm

That's natural, I explained the broader context in a different thread, that got quite long.
To give the gist of it, I propose a new reference type Sc<T>, where the lifetime has been "erased". A struct can contain a Sc without declaring a lifetime.
To achieve this, a Sc always starts empty, and the user can never directly access the reference it contains: rather, the user can use a Sc::map method to pass a closure to the Sc that will execute only if the Sc actually contains a valid reference.
To bind a reference to a Sc, one must create a Locker instance. Long story short, this locker is responsible for handling the lifetime of the reference that is passed to the Sc. The locker contains a Dropper that, upon being dropped, notify the Sc that its reference is now invalid. You can refer to this message from the other thread for more information about the usage of Sc.

Now, the thing is, for this to be safe, I need drop to be called for all Locker instances on which the Locker::lock method has been called. That's why I'm interested in a specific kind of leaks: leaks where the drop method is not called, even after the lifetimes to which Locker is tied ended. Unfortunately, in general, this kind of leak is possible in Rust :

One can call mem::forget or put the Locker in a ManuallyDrop wrapper, or call Box::leak to explicitly inhibit drop.
One can put the Locker in a Rc cycle.

I believe the latter is not possible, due to the fact that Locker is a self-borrowing struct (a design decision that was initially motivated by the desire to defeat mem::forget). This thread is an attempt to verify that claim, and also generalize it to all self-borrowing structs in a RefCell because I find this property interesting

@trentj:
I believe I worded this poorly. I didn't mean to say "if you use unsafe around my type, then you're on your own", but rather that it cannot be expected to defeat all possible unsoundness that could be introduced by unsafe code. That being said, after giving it some thoughts, I'm not sure that, even by using unsafe, someone could create additional ways of leaking a Locker, at least not without violating existing rules about non-aliasing of mutable references? Hmm, I need to try to do this with unsafe.

I wonder? Like @smarnach said, I believe a type can expose unsafe methods with a documentation indicating under which conditions it is actually safe to call these methods? For instance, what if I changed my Locker::lock method to look like this:

/// # Safety
///   To prevent any possible access to a dangling reference with the `Sc::map` method,
///   this `Locker` instance must verify the following preconditions before calling `Locker::lock`:
///     * It must not be wrapped in a `ManuallyDrop` wrapper. Beware that some functions, like `Box::leak()`, use `ManuallyDrop` under the hood.
///     * It must not be put in a cycle (like a `Rc` cycle). Fortunately, this is impossible to do using safe Rust.
///     * (once Drop objects in enum land in stable) It must not be a member of a `union`
pub unsafe fn lock(&'auto mut self, t: &'auto T, sc: &'sc Sc<T>) { /* impl omitted */ }

Would that seem reasonable? Of course, having to make Locker::lock an unsafe fn is a bummer, and I'd really like to find another way. But at this point, I just don't believe it is possible in the current rust. I'd love to be proven wrong on this though , so don't hesitate to share if you find anything (maybe related to Pin?)!

That being said, I may have an idea of a small change to the standard library (nothing major like adding an UnsafeDrop trait) that would make my type safe. I just need to make a strong case for it before proposing that change, since a library change for what would appear as very niche ("ensuring that destructors of types that cannot be put in cycles after a certain method is called will run in any case where the application can continue running") must be motivated.

That's why I'd like to be sure that I have a good grasp of the situation: a Locker can only "leak its destructor" from a ManuallyDrop (or a union, it's not clear to me if today a union is implemented in terms of a ManuallyDrop or if it is the opposite)? This is what I currently believe, given that Box::leak and mem::forget rely on ManuallyDrop under the hood.

It'd be also interesting to reuse a scheme similar to Sc (with a Locker type) to, for instance, rewrite an "almost safe" ScopedThread API. That would also allow a direct comparison to the currently existing closure-based implementation, in terms of ergonomics, etc.

I'm sorry for this quite long message, but I believe such a reference type would be useful in Rust (the use case was initially motivated by the need to store observers of an object with a shorter lifetime than that object) and I find the Locker pattern interesting, so I'd just like to see if it can lead us somewhere . That is, unless someone comes up with something that would definitely kill the idea

skysch · September 18, 2018, 8:19pm

How are you constructing a self-borrowing struct? Doesn't that have all the same problems introduced by using RefCell?

trentj · September 19, 2018, 1:06am

This is not quite the same thing as I was talking about, since in order to break a String's invariants, you have to know it's a String and willfully write unsafe code using either its public API or some knowledge about its internals. Leaking a T from a Box<T> must be safe regardless of what T is.

I think there's a finite list of things you must do when using unsafe, and if you do all those things your code can be called sound. That includes "don't violate the documented invariants of other types" as well as "don't create null references".

So to rephrase, I guess, "this code is sound because Locker can't be leaked by any of the methods currently in the standard library" isn't a strong enough guarantee, because leaking is safe, and someone could hypothetically write generic code that safely leaks a generic T using unsafe. Then instantiating T with Locker would be unsound despite the fact that the person who wrote the generic code followed all the rules.

If somebody uses unsafe and doesn't follow all the rules, and breaks one of your invariants negligently or on purpose, that's definitely on them.

(Aside: making lock an unsafe fn sidesteps this whole problem by shifting it onto your API's consumer. An unsafe API can require its consumer to do practically anything, including "ensure this is dropped before X happens", "only pass odd numbers here", etc. and it becomes the consumer's responsibility to uphold it. The argument I give above only applies to code that exposes a safe API.)

Topic		Replies	Views
How unsafe is my reference type? help	25	2644	October 17, 2018
Experimental safe-to-use proc-macro-free self-referential structs in stable Rust? code review	32	3657	March 5, 2021
Borrowing a cake and eating it too help	19	1397	August 22, 2020
It's everyone's favorite recurring topic: self-referential structs help	4	9711	March 19, 2023
Unsafe code review request with UnsafeCell	12	907	December 14, 2022

Self-borrowing struct and RefCell

Related topics