OwnRef: a smart pointer that owns the data it is referencing

I wanted to crate a way to pass data on stack to functions by reference, but be able to take ownership of that data in one of the inner functions. Let's make a OwnRef<T> type for this.

How will it work? At runtime, it will be similar to &T, you can pass it to functions, they can read it, store it, or pass it to other functions, but it cannot outlive the original function that created the data that it is referencing. However, you can consume OwnRef<T> to get ownership of T by moving the original data to the caller.

Here is the Playground link with a minimal reproducible demo, and here is the full Github crate.

First, is OwnRef even worth using?

  • Why not just use T instead?
    Size of T might be large, so moving it by flat copy from function to function could be costly, and it could increate the stack size of "intermediary" functions that just pass it along.
  • Why not use Box<T> instead?
    That requires a heap allocation. OwnRef<T> avoids that.
  • Why not use &mut T?
    You could do that, and then take the value with something like std::mem::replace(your_reference, Default::default()), but
    1. Not every type has a default value to replace it with
    2. And even for those that do, this allows taking the value multiple times. Every "take" after the first one will see "useless" default data. With OwnRef<T>, there is a compile-time check that you are taking the value only once.
  • Why not clone the data?
    1. Not every type implements clone
    2. And of those that do, cloning might not me what you want. Like a struct that contains a Vec: you don't want to clone the contents of the vec, you just want to move the struct.

With that out of the way, let's get to the actual code review. I personally have a few questions:

  1. Should I implement AsRef, Borrow, or both?
    I of course implement Deref, so that using the OwnRef can be a little bit easier. But I do not understand what the difference between AsRef and Borrow. I think that Borrow is for when you borrow data that you own, but AsRef if for when you don't own the data?
  2. Is mutating an immutable value in the destructor safe?
    To construct the raw pointer, I call NonNull::from(&T). However, on the top of that same page it says ... it is your responsibility to ensure that as_mut is never called, and as_ptr is never used for mutation. as_mut is fine, but as_ptr is used in the Drop implementation, and the result will be passed to T's Drop implementations, which takes self: &mut T, and can mutate the value! Is that a violation of that rule, or is there an exception for the destructor?
  3. Is storing NonNull<T> better than *const T or *mut T?
    I chose NonNull<T> because I know that it will not be null, but *const T gives better understanding that the value will not be mutated in OwnRef, while *mut T would signal that mutation is possible in OwnRefMut. And that is only the idiomatic choice, what about performace? Is there even any difference?
  4. What other traits should I implement?
    I definitely want to implement Send (if the inner type implements Send of course). Sync doesn't really make sense, because you never want &OwnRef, but it is safe so why not (some generic bound somewhere might need it).
    Debug also seems like a good idea, but what about traits like Eq, Cmp or Hash? Seems unlikely that someone might need to sort OwnRefs or use them as keys in a hashmap or something, but again, it is safe, so why not?
  5. Is the code safe?
    I am using miri with several test scenarios, as well as trybuild to check that lifetimes will properly prevent the OwnRef from outliving the data it is referencing, so I would be confident to put this code in one of my hobby projects, but I'm not sure if it is safe in every corner case scenario.
  6. Can you get rid of lifetime_of?
    I need a way to ensure that the OwnRef doesn't outlive the data it is referencing. I could just store a normal reference in it (instead of the raw pointers and the phantom nonsence) and let Rust deal with it, but then the data doesn't die after OwnRef dies, which is a huge problem. This approach is the only one that worked the way I wanted to, but the lifetime_of hack feels really silly. I have to pray that the compiler will optimize it into a no-op.

I have more questions, including the comparisons to the ownref crate (which seems to be doing something different), so I will probably change the name to avoid confusion, but I would appreciate if someone more experienced with unsafe Rust could take a look and offer some feedback.

To answer your questions:

Both, but at least Borrow. Not implementing Borrow will cause grief for users when they try to use your type as the key in a map, for example.

No, it absolutely isn't allowed to mutate stuff behind a &T shared reference, unless it's wrapped in an interior mutability primitive (such as UnsafeCell or Cell, etc.).

Luckily, you never mutate anything through the pointer derived from the immutable reference. Drop::drop() takes &mut self because in reality it has ownership of the value – drop is only called when there are no more owners around. With ptr::read() (or drop_in_place(), see my suggestions below), you effectively move out of the place pointed to by the reference.

Yes, because 1. it communicates intent better, and 2. it has a niche for the null pointer, so it's subject to niche optimizations (giving you size_of::<Option<OwnRef<T>>>() == size_of::<OwnRef<T>>(), for example).

As a first approximation, "everything". You should strive for implementing as many of the standard traits as possible (or makes sense). Not implementing Eq, Ord, Hash if you could means that your type takes away from the capabilities of the underlying type. Remember that your OwnRef is really equivalent with a T, so why shouldn't it have all the capabilities of T?

The only exceptions I can think of are:

  • Clone. Your type shouldn't be clone unless T: Copy because it would then cause a double-drop (there is no way to construct an OwnRef for an owned value that doesn't already exist, so it's not possible to return an OwnRef that owns a clone of *self).
  • Default, for the same reason (except that instead of a clone of another value, you'd have a Default-constructed local owned value that you also can't wrap with OwnRef due to its scope).

But otherwise, Eq, Ord (and their partial counterparts), Hash, Debug, Display, Borrow, AsRef, etc. should all be implemented.

Safe: definitely not, by definition, as it uses unsafe. What you are really asking is whether it's sound. I don't think it is, completely – see my further comments below.

You can, but you don't need to, it's fine.

It isn't silly, it's clever (albeit admittedly pretty standard in the community at this point), and it's basically guaranteed to be optimized away, because it doesn't deal with any data. Everything value-level involved in that function is zero-sized, so I would go so far as saying that even a naΓ―ve compiler couldn't emit any actual machine code from it – there's no data to even copy around. (And an empty function is basically guaranteed to be inlined and DCE'd.)

A couple other comments:

  1. How is this crate better than existing ones providing the same functionality?
  2. How is it sound to keep and dereference a pointer to a value that has been moved (into mem::forget())? It doesn't seem quite right. You should probably use ManuallyDrop instead. In general, using ManuallyDrop is safer than mem::forget() (because if you wrap in ManuallyDrop first and then drop, then you can't double-free due to a panic, which however is possible if you drop first and mem::forget() later.)
  3. The constructor doesn't relate 'a and 'b in any way. You might want to declare that both references have the same lifetime, 'a instead, or at least relate them by bounding one by the other (likely 'b: 'a)
  4. The drop impl would be less complicated using ptr::drop_in_place().
  5. The macro would also need #[allow(unsafe_code)] to compile in crates that apply #[deny(unsafe_code)].

Here's the updated code with my suggestions implemented.

7 Likes

And I'd add that it probably will be good to implement some non-standard traits, such as serde::Serialize (although not Deserialize, obviously), but behind a feature-flag, so that the crate won't pull the extra dependency if it isn't used.

In the playground you've removed the call to it, however - was that intentional?

1 Like

Is there an existing crate for this? I look at owning_ref and ownref, but both seem to put the owned data on the heap in one way or another. If I wanted that, I would just use a normal Box.

Oh my gosh yes, ManuallyDrop solves so many problems. When I started this, I just had a &T in OwnRef, but that created few problems: you could continue using the value after OwnRef it was gone, which is not sound of course (because the value might have been moved).

So, I tried to forget the value immidietely. But then the reference lifetime ends immidietely too, and the OwnRef is unusable. You would have to forget the value manually after you stop using the OwnRef, which would force the user to deal with unsafe code.

But ManuallyDrop seems to do exactly what I want: it owns the value so that it lives and doesn't die, and it doesn't run the destructor for it. Perfect!

Yes, that's way better. The way I got it working, the value was moved to the drop function, only to be dropped there. Dropping it in it's original place is much better. But I have to ask again: Is that safe (sound) to do when you have a shared reference?

I made an updated demo with the ManuallyDrop variant: Rust Playground and again, It will call drop which takes &mut self, so an & reference will be turned into a &mut reference.

My understanding was that this (converting & to &mut) is not safe to do, even when you have guaranteed elsewhere that there are no other references (and there will not be any more while the &mut is alive), like in a RefCell for example, and that you have to use UnsafeCell (or something that uses it).

But that kind of doesn't make sense when it comes to Drop. drop takes &mut self so it can easily mutate the value event if you declare it as immutable. This is of course sound since noone can read it while it is mutates, but the RefCell should also be sound but apparently it isn't.

Never seen [deny(unsafe_code)] before. If the idea is to prevent unsafe code in your crate, but allow unsafe code in external crates, then I guess the macro could generate the #[allow(unsafe_code)] flag, since the code in the macro is coming from another crate.

But I really don't see the point of this flag TBH. The unsafe keyword is the "allow unsafe function calls in this block" flag, why would you want another flag to allow the flag that allows unsafe functions to be called? If you don't want unsafe blocks in your code, just don't use unsafe.

But if it makes some people feel better that they have to write 2 lines (#[allow(unsafe_code)] unsafe { ... }) to call an unsafe method instead of 1 (unsafe { ... }) then hey, why not I guess.

No, that's not what is happening at all. When you are reading the value out of the place behind a reference, then has nothing to do with the reference anymore.

Indeed, turning &T into &mut T is unsound and never allowed.

RefCell is perfectly sound.

If there is #![deny(unsafe)] (or #![forbid(unsafe)], which can't be overridden), then you can't write unsafe { … }, even accidentally. This might be helpful for example when you are switching between crates (e.g. I allow unsafe in FFI wrappers, but basically never elsewhere), or exactly when you call 3rd-party macros that might make you involuntarily use unsafe.

1 Like

For this use case, I'd probably write a wrapper around &mut Option<T>, which can be implemented in entirely safe code. The biggest potential downside is that it moves T once within the original stack frame; I expect the optimizer will be able to skip this in most cases, but I haven't checked.

pub struct OwnRef<'a, T> (&'a mut Option<T>);

impl<'a, T> OwnRef<'a, T> {
    pub fn new<'b>(opt: &'a mut Option<T>) -> Self {
        Self ( opt )
    }

    pub fn get(&self) -> &T {
        self.0.as_ref().unwrap()
    }

    pub fn take(self) -> T {
        self.0.take().unwrap()
    }
}

impl<'a, T> Deref for OwnRef<'a, T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        self.get()
    }
}

macro_rules! new_own_ref {
    ($ref_name:ident, $data_variable:ident) => {
        let mut $ref_name = Some($data_variable);
        let $ref_name = OwnRef::new(&mut $ref_name);
    };
}
4 Likes

EDIT: The answer is fine, I just misunderstood the code.

Original: So this just panics if you take the value twice. And there is no compile-time check that you are not doing that.

You can't take the value twice, because this method takes self by value: There's no OwnRef instance left after this operation. In fact, none of the unwrap() calls in my code can panic if &mut Some(...) is passed to OwnRef::new.

If you construct the OwnRef via the provided macro, this is guaranteed to be the case. Also, the Option variable is shadowed, so there's no way for the constructing function to access the value either.

3 Likes

When it comes to the lifetime_of function ...

What do you mean? The function is called in the macro at line 82.

I mean you can probably use owned_ref.borrow() (or deref), since &Serialize surely autoimplements Serialize. But I guess, again, some (potentially badly designed) generic bound somewhere might need it, so why not.

You can't do that if you store the OwnedRef in another type: it will prevent #[derive(Serialize)] from just working.

1 Like

That was to @H2CO3, sorry for the confusion.

Yes, I intentionally removed that call, because OP asked for that. I wanted to demonstrate that it's not necessary for ensuring equal lifetimes.

Oh sorry, I confused it with one of the options I didn't mention in my original post: why not use &mut Option<T> instead?. But that's not what you are doing, you are using a newtype for &mut Option<T>, so can actually idiomatically take the value at most once. Very nice solution with minimal overhead.

That makes me wonder ... if you just have &mut T (or just &T to be honest), and then shadow the value, would that work as well? The goal is to get rid of the Option overhead. I guess that wouldn't work, because there could be existing references witch would lead to unsoundness.

There are ways around that, but ManuallyDrop seems like the best one so far.

Yes, with the ManuallyDrop, you can get rid of the lifetime hack and:

  1. Move the referenced value into the ManuallyDrop struct. This will ensure that there and no existing references.
  2. Shadow the ManuallyDrop struct, making it unusable for the user.
  3. Do not run the destructor, the data will be destructed by the OwnRef instead.

All of these were done by the std::mem::forget before, but this so much better.

It likely has zero overhead. If you literally wrap a value in Some and then unwrap it, the compiler will probably notice that it's a no-op and won't emit code for storing or checking the discriminant or panicking.

If you use &mut T, there can be no other references.

1 Like

There's really not much overhead to get rid of:

  • The discriminant will need to be stored somewhere, but that will be usually be taken care of by niche optimization (for sufficiently complex T).
  • It performs one explicit move of the value, which you also do when constructing the ManuallyDrop instance
  • In most cases, the None check inside unwrap() will be optimized out, and is generally quite cheap when it isn't. If you're really concerned, it's sound to use unwrap_unchecked() here because the entire point of the newtype is to ensure that you're pointing to Some(...):
/// A Smart poiner backed by Some(T)
pub struct SomeRef<'a, T> (&'a mut Option<T>);

impl<'a, T> SomeRef<'a, T> {
    /* Safety invariant: Any operation that can store None into *(self.0) MUST
       destroy the SomeRef */
       
    pub fn new<'b>(opt: &'a mut Option<T>) -> Option<Self> {
        if opt.is_some() { Some(Self(opt)) }
        else { None }
    }

    pub fn get(&self) -> &T {
        unsafe { self.0.as_ref().unwrap_unchecked() }
    }
    
    pub fn get_mut(&mut self) -> &mut T {
        unsafe { self.0.as_mut().unwrap_unchecked() }
    }

    pub fn take(self) -> T {
        unsafe { self.0.take().unwrap_unchecked() }
    }
}

impl<'a, T> Deref for SomeRef<'a, T> {
    type Target = T;

    fn deref(&self) -> &T {
        self.get()
    }
}

impl<'a, T> DerefMut for SomeRef<'a, T> {
    fn deref_mut(&mut self) -> &mut T {
        self.get_mut()
    }
}


macro_rules! new_own_ref {
    ($ref_name:ident, $init:expr) => {
        let mut $ref_name = Some($init);
        let $ref_name = SomeRef::new(&mut $ref_name).unwrap();
    };
}
3 Likes

One problem we sometimes encounter in Tokio is that when spawning tasks, the future might be pretty large and get moved through several stack frames, which can cause stack overflows. I've considered using a solution like this to combat that, though I would probably implement it closer to this:

use std::marker::PhantomData;
use std::ops::{Deref, DerefMut};
use std::ptr::NonNull;
use std::mem::MaybeUninit;

pub struct OwnRef<'a, T> {
    value: NonNull<T>,
    _marker: PhantomData<&'a T>,
}

impl<'a, T> OwnRef<'a, T> {
    /// Safety: The `MaybeUninit` must be initialized when this method is
    /// called.
    pub unsafe fn new(value: &'a mut MaybeUninit<T>) -> OwnRef<'a, T> {
        Self {
            value: NonNull::new_unchecked(value as *mut MaybeUninit<T> as *mut T),
            _marker: PhantomData,
        }
    }
    
    /// Safety: The pointer must be valid for writing.
    pub unsafe fn write_to(self, target: *mut T) {
        let from = self.value.as_ptr();
        std::mem::forget(self);
        std::ptr::copy(from, target, 1);
    }
}

impl<'a, T> Deref for OwnRef<'a, T> {
    type Target = T;
    fn deref(&self) -> &T {
        unsafe {
            self.value.as_ref()
        }
    }
}

impl<'a, T> DerefMut for OwnRef<'a, T> {
    fn deref_mut(&mut self) -> &mut T {
        unsafe {
            self.value.as_mut()
        }
    }
}

impl<'a, T> Drop for OwnRef<'a, T> {
    fn drop(&mut self) {
        unsafe {
            std::ptr::drop_in_place(self.value.as_ptr());
        }
    }
}

playground

3 Likes

With the current implementation on the GitHub repo, the existence of an OwnRef on a variable does not appear to prevent that variable from being modified. For example, Miri produces an error with this program:

/*
[dependencies.own_ref]
git = "https://github.com/kajacx/OwnRef"
rev = "81e1e98d43521e1ffb79d2d397ea45370eb624cb"
*/

use own_ref::new_own_ref;

#[allow(unused_assignments)]
fn main() {
    let mut s = "Hello, world!";
    new_own_ref!(s_ref, s);
    s = "Hello, Rust!";
    println!("{}", *s_ref);
}
error: Undefined Behavior: trying to retag from <2841> for SharedReadOnly permission at alloc1496[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:385:18
    |
385 |         unsafe { &*self.as_ptr() }
    |                  ^^^^^^^^^^^^^^^
    |                  |
    |                  trying to retag from <2841> for SharedReadOnly permission at alloc1496[0x0], but that tag does not exist in the borrow stack for this location
    |                  this error occurs as part of retag at alloc1496[0x0..0x10]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <2841> was created by a SharedReadOnly retag at offsets [0x0..0x10]
   --> src/main.rs:12:5
    |
12  |     new_own_ref!(s_ref, s);
    |     ^^^^^^^^^^^^^^^^^^^^^^
help: <2841> was later invalidated at offsets [0x0..0x10] by a write access
   --> src/main.rs:13:5
    |
13  |     s = "Hello, Rust!";
    |     ^^^^^^^^^^^^^^^^^^
    = note: BACKTRACE:
    = note: inside `std::ptr::NonNull::<&str>::as_ref::<'_>` at /home/lm978/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:385:18
    = note: inside `own_ref::OwnRef::<'_, &str>::get` at /home/lm978/.cargo/git/checkouts/ownref-aff9b63a86d5c4db/81e1e98/own_ref/src/structs/own_ref.rs:32:18
    = note: inside `<own_ref::OwnRef<'_, &str> as std::ops::Deref>::deref` at /home/lm978/.cargo/git/checkouts/ownref-aff9b63a86d5c4db/81e1e98/own_ref/src/structs/own_ref.rs:46:9
note: inside `main` at src/main.rs:14:20
   --> src/main.rs:14:20
    |
14  |     println!("{}", *s_ref);
    |                    ^^^^^^
    = note: this error originates in the macro `new_own_ref` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error
2 Likes

FWIW, this sounds like

but with a better name :upside_down_face:

6 Likes

I also want to mention

which also defines a MoveRef type. moveit is primarily concerned with providing C++-style address-aware constructors, though, so it's a bit more complicated.

Notably, if you want to support Pinning, you can't exclusively store *mut T, because your owning reference could be itself leaked β€” you need to also have a drop flag such that the owner can drop the value (or just abort) if you forget to.

It's for this reason that I'd honestly recommend using (a wrapper of) &mut Option<T> instead of writing any unsafe. That extra bit does serve a deliberate purpose.

2 Likes