How unsafe is my reference type?

zrk · September 8, 2018, 5:30pm

Hello, I've been inspired by this thread to write a "new" (as in "I don't know of prior art, but I didn't look that hard either and I'm fairly new to rust so may be missing the obvious") kind of reference type that can be stored in a struct without being parameterized by a lifetime and without allocating its referent (unlike Rc, which is shared ownership).

In short, the idea behind it is that when a Sc is set to point to an object, it returns an intermediate Dropper type that retain the lifetime of the reference, while the Sc itself contains a raw pointer to the object.
When the Dropper instance is dropped, it sets the pointer in Sc to null.
When the user of an Sc wants to query its reference, they call Sc::get() which returns an Option<&T> that is either a reference to the object if the pointer is not null (the dropper is still alive), or None if the pointer is null (the dropper is dead).

let sc = Sc::new();
assert_eq!(sc.get(), None); // Nothing in Sc, initially
{
    let s = String::from("foo");
    let _dropper = sc.set(&s); // set a reference to s in Sc
    assert_eq!(sc.get(), Some(&s)); // access the reference
}
assert_eq!(sc.get(), None); // After s is dead, nothing in Sc again

I set up a small repository with some code for Sc, and a small example of use.

However, I know that using this type is unsafe : if the user mem::forget the dropper instance, then the pointer inside Sc will never be invalidated as it should. Furthermore, the naive design with a public Sc::get() method allows code like the following:

let s_ref : &str;
let sc = Sc::new();
{
    let s = String::from("foo");
    let _dropper = sc.set(&s);
    s_ref = sc.get()?;
}
println!("Happily prints a dead reference: {}!", s_ref);

For now, I tried to prevent this kind of misuse by replacing the Sc::get() method with a Sc::visit() method that accepts a closure taking a &T as parameter, and that calls the closure if reference is Sc is still alive (e.g. Dropper is still alive).

impl<T> for Sc<T> {
    pub fn visit<'a, U, F: Fn(&'a T) -> U>(&'a self, f: F) -> Option<U> {
        unsafe {
            match self.get() {
                Some(x) => Some(f(x)),
                None => None,
            }
        }
    }
}

So, I guess, my questions are:

1° Is there any way to force a reference returned by get() to be attached to the inner scope, in the example given above, or am I stuck with a visit() method as the sole means of enforcing this? Passing _dropper to get() is a no-go, because then the caller would have to keep _dropper and the Sc around, which kind of defeat the purpose (see the example for a sense of what that would be annoying).
2° Are there still ways to invoke UB even by having a visit() method rather than a get() method (other than by calling mem::forget(_dropper))?
3° Is there any way to design _dropper so that mem::forget(_dropper) isn't unsafe to call? If there isn't (which I fear), how serious are the consequences? I believe the user cannot easily create cycles containing Droppers, because they are structs with lifetimes, but is this true?

Thank you for the answers. I hope I'm not doing something too silly .

gbutler69 · September 11, 2018, 11:18am

I wonder if a new annotation could be added to the language that would mark a type as not permitting safe mem::forget?

For example, something like this:

[#no_safe_forget]
pub struct ScDropper { ... }

A struct so annotated would not be permitted to be "mem::forgotten" in a "safe" context and would require unsafe. I wonder if something like this could be created without too much hassle? Perhaps a Pre-RFC to discuss this option if there is no other work-around?

EDIT: Perhaps a better solution would be the introduction of a new "Marker Trait":

pub Trait UnsafeForget;

Then, mem::forget could change to:

pub fn forget<T>( t : T ) -> () where T : !UnsafeForget {...}

The compiler would never auto-derive "UnsafeForget" for any type, but, you could make a type implement this marker trait to make it "Unforgettable" using the "safe" mem::forget as follows:

impl UnsafeForget for ScDropper;

All this solution requires is the support for negative trait bounds which I believe are already available on nightly, but, not yet stabilized.

gbutler69 · September 11, 2018, 11:18am

I think you mean "isn't safe to call", right?

zrk · September 11, 2018, 11:53am

I meant either one of those, actually. Either I can specify that my type is unsafe to mem::forget, or I can think of another design where mem::forget doesn't cause any unsafety. Unfortunately at the moment I don't have such other design in mind.

I actually like the idea of an annotation (and either better the trait idea)! I wonder if such discussions took place when it was decided to make mem::forget safe in the first place. In particular, like I said on the other thread, my Dropper type (which isn't safe to mem::forget) also has specific constraints, in that it is not 'static. I wonder if alternatively, we could make mem::forget unsafe for non 'static types? I wonder if that would break a lot of code or not?

Regarding backcompat, I imagine changing the definition of forget to rely on !UnsafeForget is not too bad, since currently no type implement this trait (since it doesn't exit :-)). But I wonder, would that mean that any code calling forget in a generic context would have to add that (negative) bound as well?

gbutler69 · September 11, 2018, 12:01pm

Yes, it seems like that could be a problem. I wonder how prevalent that is? Perhaps using a positive, auto-derived trait instead?

pub Trait SafeForget;

That is by default auto-derived for all types. Then mem::forget becomes:

pub fn forget<T>( t : T ) -> () where T : SafeForget {...}

And your ScDropper type becomes:

pub Trait ScDropper : !SafeForget { ... }

~~I think that would be 100% backwards compatible.~~

Whoops, this still requires updates when mem::forget is called in a generic context (I think) so, it solves nothing.

Hyeonu · September 11, 2018, 12:05pm

This can be solved with some advanced lifetime trick. The key point is make your dropper logically borrow the setted reference, so dropper cannot outlive it.

struct Dropper<'a, T> {
    ptr: *mut *const T, // replace it with your actual dropper code
    _marker: PhantomData<&'a T>,
}

impl<T> Sc<T> {
    pub fn set<'a>(&self, v: &'a T) -> Dropper<'a, T> {
        self.data.set(v);
        Dropper { ptr: self.get_ptr(), _marker: PhantomData }
    }
}

PhantomData is zero-sized type so in runtime it doesn't have any value. But during compile time, borrow checker treat it as its type parameter, in this case &'a T, so it perform lifetime checking as usual. So if the dropper is leaked then the compiler throws error that the dropper is outlived from its original value, which is exactly what you want. Awesome?

gbutler69 · September 11, 2018, 12:09pm

~~How does this prevent "mem::forget" being called on Dropper? Or, will this somehow still complain even though "mem::forget" does it's work in an unsafe block? I'm not understanding how this works.~~

~~Actually, yes, it does make sense. That's a neat trick! Sounds like "Problem Solved"?~~

As pointed out below, this doesn't actually work. mem::forget consumes its value so, from the perspective of the compiler, there is no reference still alive that needs to end.

Hyeonu · September 11, 2018, 12:14pm

mem::forget means the argument will not be dropped, so that value will be live until this process is killed. So compiler will throw reference outlives the original value error.

TechnoMancer · September 11, 2018, 12:28pm

You also have to forbid putting it in Rcs or Arcs transitively then as well and also forbid putting it in a ManuallyDrop as well I guess

mem::forget consumes it's argument and just doesn't run any drop code when it dies, so once you have passed the dropper to mem::forget it is gone and your program cannot access it anymore but its drop was never run so the Sc will never be emptied and the dangling reference will still be available to use.

Importantly mem::forget does not mean the value will live until the process is killed, semantically whatever you passed to it is gone just as any other value to move into a function in rust, the only difference is that the drop code is never called.

juleskers · September 11, 2018, 1:29pm

I'm afraid that is going to be very difficult to implement.. Since the great 'Leakpocalyse' of 20162015, Leaking memory (i.e. never running Drop) is considered memory safe.
This was the least-bad way out of a very difficult situation. It proved unsoundness in the rust core, because there are simply too many ways to (accidentally or not) leak memory: In the very least, mem::forget itself, and cycles of RC/ARC came up. (edit: upon re-reading Huonw: thread deadlock too)

It basically taught us that it's a bad idea to rely on Drop-impls for memory safety, as Drop's guarantees aren't as strong as we'd like.
Java has the same problem with it's destructors, which promise a "best effort" to run, but keep caveats like Out-of-Memory, JVM shutdowns, and never being garbage-collected. It's the prime reason interface Closeable exists.

See also:

RFC-1066: safe mem::forget
huonw's blog: memory leaks are memory safe
the nomicon on leaking
Book 2nd ed: reference cycles can leak memory
Alexis Beingessner: LeakPocalypse overview and workarounds/solutions- 'Pre-Pit your Peaches' (thanks @trentj)

From the different github discussions I've seen that suggest new OIBIT/auto-traits, there is a massive reluctance to add more auto-traits to the rust core, because (as your edit about generic contexts shows) each new auto-trait complicates all generic downstream code.
Since std has a quite a lot of downstream, the threshold for adding new ones seems to be right about at "maybe, if its the only way, after years of design work, to remedy pre-existing unsoundness"

Exactly; this is a lifetime problem through and through (tying the lifetime of the reference to the lifetime of the dropper), and this is exactly the kind of scenario for which PhantomData was designed (be sure to check that doc-page for excellent showcase of full Unicode emoji support in Rust sourcefiles + RustDoc).

edit: oh man, 2015? has it really been that long already! Also: added link after reminder from @trentj

Hyeonu · September 11, 2018, 1:47pm

After a quick experiment, I figured out that my assumption was wrong. When value is mem::forget()ed borrow checker thinks this value is consumed, not leaked.

Now I think that Sc::get() cannot be implemented in safe way without the guarantee that the dropper always be dropped. What you're trying to do is dynamically bound external value's lifetime into Sc, which is not supported by rust.

Experiment: Rust Playground
println!() macros are commented out, otherwise this will attempts to print invalid utf8 sequence and playground sandbox will be terminated.

trentj · September 11, 2018, 1:54pm

I think this is the test you need to pass:

let sc = Sc::new();
{
    let s = String::from("foo");
    let dropper = sc.set(&s);
    mem::forget(dropper);
}
assert_eq!(sc.get(), None); // or alternative with `sc.visit(...)`

i.e., you can't rely on mem::forget being unsafe to call. Sc::visit doesn't fix this problem.

In addition to @juleskers's links, this blog post is a good overview of Leakpocalypse 2015. Many solutions were proposed, along with some that looked very similar to UnsafeForget. There are no types that today are UnsafeForget, but there are a bunch of types that would have to have !UnsafeForget bounds applied to them. You can leak something by putting it in an Rc cycle, so Rc<T> would presumably have a T: !UnsafeForget bound. And that means anything that deals with Rc generically has to also add that bound. Box::leak is safe, so it would also need a where T: !UnsafeForget bound, and so on.

The API you've outlined can't be implemented soundly, but it's possible you could make an API that would work by passing closures, similar to the way thread::scoped solved its problem. I think you would probably need to allocate something, though (not necessarily the whole T). The same blog post I linked above might give you some ideas.

juleskers · September 11, 2018, 2:10pm

Ah yes, how could I forget the PPYP-solution! Even the name itself caused a minor outcry!
(I was a fan of his "pre-pit your peaches" compromise, but it seems he's kept to the poo-version too)

I've added it to my list, for completeness sake. Thanks!

zrk · September 11, 2018, 9:13pm

Thank you for all the very informative links! So, reading the replies here, my thoughts are the following:

The Sc::get() is definitely unsafe, since it returns a reference that is only bound to the lifetime of Sc and not of dropper. Using the Sc::visit method (that I renamed to map because it looks exactly like Option::map) should be safe in that regard, so I consider that that problem solved.
The leakpocalypse is real and Sc will be unsafe forever if it relies on the destructor of the Dropper instance to be run. I don't have a solution to that problem.
- I don't know if a "PPYP" scheme could work, because I'm don't know what I could leak to mitigate the problem if Dropper is leaked. Well, I do know, that would be the pointed to object, but then I would probably have to allocate it, which kind of defeats the purpose.
- I also don't know if a "scope-based" solution would work, because it would require to open a new scope when calling Sc::set(), so basically having Sc::set() taking a closure, which I feel would limit the use case where I call Sc::set() and Sc::get() from different parts of the program (eg set when registering an observer, get when notifying the observer of a change). Or maybe there's something I didn't understand here?

As a side note, I'm really surprised that Rc is allowed to leak cycles of structs that are not 'static. This really "feels" like a bug to me, and certainly something that compromises the mental model I had of non 'static structs before.

juleskers · September 12, 2018, 3:45pm

Callbacks are a specific case of the whole 'cyclic references are hard in Rust' topic.
It really messes up some paradigms we're used to, and there are multiple people looking into other paradigms, for example for GUI's (where callback button actions are pervasive).

The PPYP idea is to pre-leak while still in a known good state, and then do the 'risky' thing, and then clean up.
Unfortunately, in your case the risky thing is the very first step:Sc::get. The only safe "pre-pitted" state would be to always return None

I hope you're not discouraged by the information I've given you, it is a truly hard problem!

I personally think that the scoping solutions are the only sound way forward.
Just thinking about it outside-in:

we have a variable, and this variable intrinsically has a scope
we wish to give away a reference to this variable
the only places this reference is valid, is inside its scope
the only way to guarantee it is never given away is either:
- statically verify it isn't, during compilation (i.e. lifetimes)
- 'somehow' make the scope an explicit thing during runtime

the only 'somehow' I'm currently aware of, are closures..

zrk · September 12, 2018, 8:34pm

You're articulating very well why I can't use the PPYP pattern.

I've been thinking, could my problem be solved by making my Dropper struct immovable after it is returned from the Sc::set() method? That way I wouldn't be able to move it in a Rc nor in mem::forget?

I believe the Pin RFC covers how I could do that, but I'm not sure to understand how that would work exactly?

cuviper · September 12, 2018, 9:20pm

Note that mem::forget itself is nothing special, just:

pub fn forget<T>(t: T) {
    ManuallyDrop::new(t);
}

I think you're right that being immovable would ensure a drop, but I'm not sure how you could enforce that after a return, since even the return itself involves moving.

The way stuff like thread locals and scoped threads have been solved is by forcing an inner calling scope. You never give the thing-that-must-be-dropped to an untrusted caller. Instead, you might have something like a with accessor:

let sc = Sc::new();
assert_eq!(sc.get(), None); // Nothing in Sc, initially
{
    let s = String::from("foo");
    // set a reference to s in Sc, for the duration of the closure
    sc.with(&s, || {
        assert_eq!(sc.get(), Some(&s)); // access the reference
    });
    assert_eq!(sc.get(), None); // After `with` returns, nothing in Sc again
}
assert_eq!(sc.get(), None); // After s is dead, still nothing in Sc

This is somewhat like what you were doing with visit, but without exposing any dropper.

juleskers · September 13, 2018, 5:25am

Does it? If I understand correctly, Pin is supposed to prevent the memory address from changing.
As far as I understood it, this doesn't prevent you from 'moving' ownership. (They're not talking about the same form of moving).

If we can still give ownership away, we can still pass it to mem::forget, and we're still at square one. No?

Or do you mean a reference to a Pin has a longer lifetime? SC would have to be !Unpin in that case, promoting it to 'static lifetime, and solving the problem that way, at the cost of allocating.

zrk · September 13, 2018, 9:31pm

OK, it took me some time, but I believe I may have something that would work and be safe without using closures (save for the visit/map method of Sc).

The solution I found involves three objects, not two like the previous solution.

The first object is Sc, and it is completely unchanged:

pub struct Sc<T>(Cell<Option<*const T>>);

The second object is our dropper, which is simpler than ever:

struct Dropper<'sc, T: 'sc> {
    sc: &'sc Sc<T>,
}

impl<'sc, T> Drop for Dropper<'sc, T> {
    fn drop(&mut self) {
        self.sc.0.set(None)
    }
}

(Dropper is not public anymore in this design)

Then, I introduce a third object that I call Locker:

pub struct Locker<'sc, 'auto, T: 'sc + 'auto> {
    sc: Option<Dropper<'sc, T>>,
    marker: Marker,
    autoref: Option<&'auto Marker>,
}

This object is self-referential, Locker::autoref will eventually reference Locker::marker.

Then, I replace the Sc::set method with the following Locker::lock method:

impl<'sc, 'auto, T> Locker<'sc, 'auto, T> {
    pub fn new() -> Self {
        Self {
            sc: None,
            marker: Marker,
            autoref: None,
        }
    }

    pub fn lock(&'auto mut self, val: &'auto T, sc: &'sc Sc<T>) {
        let ptr = val as *const T;
        sc.0.set(Some(ptr));
        self.sc = Some(Dropper { sc });
        self.autoref = Some(&self.marker);
    }
}

Because Locker becomes a self-borrowing struct after a call to Locker::lock, it is not possible to call mem::forget() on it (it is borrowed).

(for clarity, here is Sc's full impl block):


impl<T> Sc<T> {
    pub fn new() -> Self {
        Self { 0: Cell::new(None) }
    }

    unsafe fn get(&self) -> Option<&T> {
        self.0.get().map(|ptr| mem::transmute(ptr))
    }

    pub fn map<U, F: Fn(&T) -> U>(&self, f: F) -> Option<U> {
        unsafe { self.get().map(f) }
    }

    pub fn is_none(&self) -> bool {
        unsafe { self.get().is_none() }
    }
}

You can then use Locker in the following way:

fn it_works() {
    let sc = Sc::new();
    assert!(sc.is_none());
    {
        let s = String::from("foo");
        {
            let mut locker = Locker::new();
            locker.lock(&s, &sc);
            // cannot mem::forget(locker) here, because locker borrows itself
            // cannot mem::forget(s) either, because it is borrowed by locker
            assert!(!sc.is_none());
        }
        assert!(sc.is_none());
        {
            let mut locker = Locker::new();
            locker.lock(&s, &sc);
            assert!(!sc.is_none());
        }
        assert!(sc.is_none());
    }
    assert!(sc.is_none());
}

For convenience, I also provide a Wrapper type:

pub struct Wrapper<'sc, 'auto, T: 'sc + 'auto> {
    locker: Locker<'sc, 'auto, T>,
    data: T,
}

impl<'sc, 'auto, T> Wrapper<'sc, 'auto, T> {
    pub fn new(data: T) -> Self {
        Self {
            locker: Locker::new(),
            data,
        }
    }

    pub fn lock(this: &'auto mut Self, sc: &'sc Sc<T>) {
        this.locker.lock(&this.data, sc);
    }
}

// also impl Deref<Target=T> and DerefMut for Wrapper<T>

This enables the following:

#[test]
fn with_wrapper() {
    let sc = Sc::new();
    assert!(sc.is_none());
    {
        let mut s = Wrapper::new(String::from("bar"));
        {
            Wrapper::lock(&mut s, &sc);
            assert!(!sc.is_none());
        }
        // actually here we are still locked because a wrapper automatically locks for its whole lifetime
        assert!(!sc.is_none());
    }
    assert!(sc.is_none());
}

I pushed this as a branch on my repository (I only updated the Observer example, because I broke Sc with trait objects with this release).

What do you think?

Hyeonu · September 14, 2018, 12:10am

Please use reborrow to convert pointer to reference, as document suggests.

Anyway, does it really work? For me it looks like Locker cannot be dropped after all as it's potential borrowed, and struct's fields are live longer than the outer struct. But if it works, awesome! It really is an interesting approach, and definitely worth trying.

Topic		Replies	Views
Self-borrowing struct and RefCell help	9	2206	January 12, 2023
TrashPool to retain temporaries past the end of a scope code review	31	1599	August 9, 2020
Unsafe code review: semi-owning Weak<RwLock<T>> guard code review	18	1025	September 18, 2023
Experimental safe-to-use proc-macro-free self-referential structs in stable Rust? code review	33	3482	June 3, 2021
Ephemeral (or "weak") closures code review	31	2817	March 14, 2022

How unsafe is my reference type?

Related topics