FFI: NonNull<T> vs &UnsafeCell<T> and what the hell is covariance

So I have a crate that does some FFI, currently my wrapper structs use *mut T to store
the pointers to c values. These pointers are never mutated from the rust side, only passed
to FFI functions that do mutate.

For optimization sake I would like to change these to NonNull<T>. I know the pointers are never
null, however I don't really understand the parts about covariance.

Also the wrapper structs hold a lifetime to a handle so there's a PhantomData<&'a T> to the handle struct.

So basically my questions are:

  • Is it safe to swap the raw pointer for NonNull in this context?
  • Do I need to use PhantomData<&'a Cell<T>> instead of PhantomData<&'a T>?
  • Can/should I just use &'a UnsafeCell<T> instead?
3 Likes

Basically, if your struct is covariant, then YourStruct<'a, &'long T> can be automatically turned into YourStruct<'a, &'short T>. If that conversion is incorrect, then it should be invariant. If you don't know what to pick, then invariant is always the safer option as it lets you do strictly less things.

As for your phantom data, you can use PhantomData<&'a mut T> to make it invariant in T.

4 Likes

If you are doing FFI with a T pointee, then chances are that T : 'static already (otherwise it's easy to get use-after-frees in some patterns). In that case, there is no variance to talk about in the T parameter, since you won't have lifetime-based subtyping in there.

Moreover, assuming you didn't have that 'static parameter, since the FFI is able to mutate it, then you're effectively having a mutable handle to a T, such as &Cell<T> or &mut T. Both of these cases are to be invariant (the rule of thumb is that a borrow able to mutate a T must be invariant in T for soundness).

Remember the rule of thumb:

  • (Regarding 'a, the lifetime of the borrow itself, since shrinking that "knowledge" about the exact duration of the borrow can't cause unsoundness, you can venture into being covariant in that parameter)

But again, try to go for a conceptual level, and you'll rarely be wrong: if you have 'a-borrowed shared (&) mutable (Cell; very typical semantics in the FFI world) access to a T, then do add a PhantomData<&'a Cell<T>>. While you could also use PhantomData<&'a mut T>, many FFI APIs do not deal with the very strong and strict unique reference semantics of Rust, so I find those ill-suited for FFI: &'a Cell<T> would be more honest.

5 Likes

So are you saying this is how it should be?

struct Handle {
    handle: NonNull<c_handle>,
}

// the c pointer references the c handle
struct Foo<'a> {
    foo: NonNull<c_foo>,
    _marker: PhantomData<&'a Cell<()>>,
}

Additionally is this better/sound?

// the c pointer references the c handle
struct Foo<'a> {
    foo: &'a UnsafeCell<c_foo>,
}

Then for a third solution I'm considering making the rust type like str and only holdable by reference:

struct Foo {
    foo: UnsafeCell<c_foo>,
}

impl Foo {
    // ptr must point to a valid c_foo
    unsafe fn from_ptr<'a>(ptr: *mut c_foo) -> &'a Foo {
         Foo { foo: transmute(ptr) }
    }
}

As for the third solution: transmuting pointers is very dangerous and usually not the correct thing to do. The formulation with an unconstrained lifetime parameter seems wrong, too: it allows the caller to chose any lifetime 'a whatsoever, which is unsound if the backing pointer is invalidated (e.g. because it is freed) while someone is still holding the returned reference.

Do the c_handle and c_foo exhibit ownership semantics? Handles tend to do so, in which case it's weird to ascribe a lifetime to them.

As for the third solution: transmuting pointers is very dangerous and usually not the correct thing to do. The formulation with an unconstrained lifetime parameter seems wrong, too: it allows the caller to chose any lifetime 'a whatsoever, which is unsound if the backing pointer is invalidated (e.g. because it is freed) while someone is still holding the returned reference.

Maybe it should be foo: &*(ptr as *mut UnsafeCell<c_foo>) instead but same gist.

The caller also needs to make sure the lifetime is right, part of why the function is unsafe. Something like:

Impl Handle {
    fn foo<'a>(&'a self) -> &'a Foo {
        unsafe { Foo::from_ptr(self.handle.as_ptr()) }
    }
}

Do the c_handle and c_foo exhibit ownership semantics? Handles tend to do so, in which case it's weird to ascribe a lifetime to them.

Not sure what you mean here. The handle doesn't have a lifetime only Foo does.

Just go for a raw pointer. They are easier to reason about.

1 Like

I want the niche optimisation.

Yes. NonNull is a raw pointer.

Not exactly, the docs go on about variance and that there's differences and that you must ensure your type is invariant and that's what I'm confused about.

Well, what I meant was, just go for NonNull with a PhantomData. That's easier to reason about than using a real reference. Besides, variance also applies to ordinary raw pointers. The *const T type is covariant too and it's just as important for them.

1 Like

The current implementation uses a *mut pointer anyway with a PhantomData<&'a ()>. So it's equivalent if I change it to NonNull and PhantomData<&'a mut ()> right?

But If I kept the PhantomData to a not mut reference it would then be unsound right?

That's fine.

As for whether it would be unsound with covariance, it's hard to say without more details, but invariant is definitely ok.

1 Like

Well less about if it's sound or not, just if I didn't also change the PhantomData it would change from invariant to covariant right?

Just trying to make sure I understand it all.

Actually, if you want to be controlling the variance, you need to do PhantomData<&'a mut T> rather than PhantomData<&'a mut ()>.

1 Like

The struct is not generic over a type and c_handle is 'static.

I'm starting to think maybe it doesn't matter at all in the non generic case...

If your struct isn't generic, then variance doesn't matter.

3 Likes

Right, thanks that makes things easier.

1 Like

Yeah, the proper expansion of the T-using variants is PhantomData<&'a Cell<c_foo>>.

However, I suggest that your Foo struct should actually have a PhantomData<&'a Handle> because that's where the lifetime is coming from, right?