Need help designing a safe Handle API for a C void pointer

Hello all,

I’m struggling with an API design issue that is slightly above my level of experience. I would be so grateful for your help.

The problem is very simple: provide an easy to use Rusty handle to a piece of data managed as a void pointer in a C library. A little graphic of my current design:

      Bindings API  |    "DataHandle"    |     C library
                    |                    |
               provides  +---------+  manages
      Ref<T> <------+----| RefCell |-----+---> *mut c_void
   RefMut<T> <------+----| *mut T  |     |    
           T        |    +---------+     |
         ...        |                    |
struct DataHandle<T> {
    data_ptr: RefCell<Option<NonNull<T>>>,
}

impl<T> DataHandle<T> {
    fn replace(&self, data: T) -> Result<Option<T>>;
    fn take(&self) -> Result<Option<T>>;
    fn borrow(&self) -> Result<Option<Ref<T>>>;
    fn borrow_mut(&self) -> Result<Option<RefMut<T>>>;
}

This is my DataHandle. It works, it’s usable. Fair enough!

I do have a feeling that it could be better than this. Couldn’t I make an API that uses actual Rust references (&/&mut) instead of the RefCell? So that I get the usual compile-time aliasing/safety guarantees? But then I need a different kind of interior mutability – UnsafeCell, probably.

I have a first sketch of what that would look like, but it uses a lot of unsafe and there’s even a worrying clippy error (mutable borrow from immutable input(s)). I’ve also made it lazy, not at all sure it’s done right.

impl<T> DataHandle<T> {
    fn replace(&mut self, data: T) -> Result<Option<T>>;
    fn take(&mut self) -> Result<Option<T>>;
    fn borrow(&self) -> Option<&T>;
    fn borrow_mut(&mut self) -> Option<&mut T>>;
}

Here is the sketch.

My most pressing question is whether this contains UB.

More generally: thoughts on this design problem? Other approaches how a DataHandle (just access to data behind a raw pointer!) could be done, is done?

I know it’s a lot to ask to review this, but this is definitely one of those cases where a fresh (and more capable) pair of eyes can see better … thank you!

I don't see a reason to ever use RefCell for such thing.

If C can independently (on another thread or via calls you don't control) mutate data behind the pointer, then you can't expose that data in Rust, not even behind RefCell, because it breaks fundamental aliasing rules.

If C doesn't mess with that data, then it's fine to expose it as & or &mut. You don't need to do anything besides holding *mut T pointer and calling as_ref() or as_mut() on it in a getter with matching mutability of self.

1 Like

All right, but there’s one place where I can’t do this – borrow.

borrow has the signature fn borrow(&self) -> Option<&T>, but it needs interior mutability, because I’m lazily looking up and caching the raw pointer from the C library.

This is the crucial bit that made me introduce the UnsafeCell (aliased as Lazy).

type Lazy<T> = UnsafeCell<Option<T>>;

pub struct DataHandle<T> {
    data_ptr: Lazy<Option<NonNull<T>>>,
}

impl<T> DataHandle<T> {
    // ...

    pub fn borrow(&self) -> Option<&T> {  // needs interior mutability!
        unsafe { self.data_ptr().as_ref().map(|x| x.as_ref()) }
    }

    pub fn borrow_mut(&mut self) -> Option<&mut T> {
        unsafe { self.data_ptr().as_mut().map(|x| x.as_mut()) }
    }

    // This provides the interior mutability, but might be UB?
    unsafe fn data_ptr(&self) -> &mut Option<NonNull<T>> {
        (*self.data_ptr.get()).get_or_insert_with(|| NonNull::new(get_context() as _))
    }
}

Note: in this C library the raw pointer does not change underneath us, so once we have it we can cache it, ie no repeated calls to get_context necessary.

Perhaps I should explain why I’m worried about UB. UnsafeCell::get has this to say about casting to &/&mut:

Ensure that the access is unique (no active references, mutable or not) when casting to &mut T, and ensure that there are no mutations or mutable aliases going on when casting to &T

In borrow I only have a shared &self available. But when I access the UnsafeCell I briefly have to use a &mut just to obtain and cache the value from the C library. Then I immediately drop from &mut back to &. It can’t be seen from outside, but I do seem to violate the requirement above. That’s what worries me.

The data_ptr method is also what clippy complains about – again I don’t know if this is fatal, or if clippy just cannot see that what I’m doing is safe, as it’s strictly a private method.

error: mutable borrow from immutable input(s)
  --> src/lib.rs:63:34
   |
63 |     unsafe fn data_ptr(&self) -> &mut Option<NonNull<T>> {
   |                                  ^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[deny(clippy::mut_from_ref)]` on by default
note: immutable borrow here
  --> src/lib.rs:63:24
   |
63 |     unsafe fn data_ptr(&self) -> &mut Option<NonNull<T>> {
   |                        ^^^^^
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#mut_from_ref

error: aborting due to previous error

For caches like that you can use once_cell. It will provide you with locking necessary to make shared access correct.

fn foo(&self) -> &mut T is a guaranteed UB. Never do this. It can trivially be used to break uniqueness of &mut.

2 Likes

Ah, this is very helpful and interesting code to read, thank you.

In the meantime though I ran into some other trouble with the proposed design based on &/&mut … I noticed that the RefCell design is much more flexible because it allows some usage patterns that are obviously correct to the human writer, but that the compiler won’t allow when written as &/&mut. So might stick with RefCell after all.

Of course, the thing I wasn’t sure about was if it is still UB if no one outside can ever observe it (if a tree falls in the wood etc.)

The compiler observes it; the compiler observes everything. UB is a situation where you have violated the constraints of the LLVM backend of the compiler. Once you violate those constraints LLVM (i.e., the compiler toolchain) can "optimize" your code into arbitrary behavior, and do so differently every time you or someone else tweaks any code in the program, its dependencies, or the compiler itself.

1 Like

Thank you Tom for explaining this so clearly.

Apart from the optimizer ruining things, in Rust there's a contract that if your function is marked as safe, it's must be impossible to misuse it for unsafe things even if you give your API to an infinite number of monkeys with keyboards.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.