Opaque pointers in FFI

Hello

I'm playing with some FFI with C++. Basically, I want to put some rust in an existing application. I own both sides, so I can provide C-like APIs from either side. I also call functions both ways.

Now, let's say I create an object of some kind in one language and want to store it in the other for a while, as an opaque pointer (to create a wrapper on the other side, or pass it back in a callback or something).

I have something that works, but I have a feeling there just must be a better way.

When I have a C++ object, I can represent the pointer to it as a pointer to an empty enum, as the book recommends (FFI - The Rust Programming Language). But this feels a bit fishy and I've heard somewhere this isn't exactly correct. Because, such type doesn't have any valid representation. Because it can't exist, there's no valid address it could exist in, and therefore any pointer to it is invalid. Or are pointers allowed to point to invalid data? Isn't that undefined behaviour?

Now, the other direction. I have something in Rust (let's say a struct). That thing isn't #[repr(C)] (because it contains other things that are not and these are from other crate). But If I declare an extern C function that takes pointer to that, I get a lint: „found struct without foreign-function-safe representation annotation in foreign module, consider adding a #[repr(C)] attribute to the type“. Adding the attribute doesn't help. How severe is the warning? Does that only mean C doesn't know the correct layout of the struct at the end of the pointer, or does it mean that even the pointer could be „incompatible“ with C (eg. different size, or something)? Am I ok just placing #[allow(improper_ctypes)] before the function declaration if I never dereference the pointer on the other side, just store it and pass it back later on, or do I have to cast it and pass it as *mut c_void?

There's no ideal solution yet:

https://github.com/rust-lang/rust/issues/43467

Currently bindgen uses

pub struct Foo { _unused : [ u8 ; 0 ] }

to implement opaque types.

1 Like

I don’t think there’s anything special about empty enums - they’re just another type of ZST, for which it’s totally fine to form references or raw pointers to. They happen to have no variants (but some could be added in the future, in general - compiler doesn’t know that it’s meant to model opaque ffi pointers) so nothing can be instantiated but that doesn’t preclude reference/pointer formation.

It just means that the struct isn’t following C ABI and FFI code may not be able to read/write the data properly. But roundtripping a pointer should be fine.

Empty enums are different in that they can never actually be instantiated. As far as I'm aware it's not even accurate to describe them as zero-sized, but rather as uninhabited. Another way to put it is that empty enums are isomorphic to ! while ZSTs are isomorphic to (), and having a pointer that claims to point to ! is a different thing than having one that claims to point to ().

It's true that an empty enum is uninhabited, but I've not heard that this implies you cannot form a reference (or a raw pointer) to it. The following compiles today:

#[derive(Debug, Copy, Clone)]
enum Empty {}

fn foo(x: &Empty) {
    let x = *x;
    println!("{:?}", x);
}

Of course we can't actually get a &Empty value to pass there (without unsafe code), but the formulation above compiles without a warning.

I think this issue raises the point of what it means to have a reference (or raw pointer) to an uninhabited type. If you take the meaning of a reference as "points to a valid value of the underlying type", then even the above is wrong since the type has no values. Raw pointers are probably a bit more lax because they can always, at the least, be null.

I'm not sure if the Rust memory/unsafe model is going to be revised to somehow answer this definitively. But I've definitely seen a bunch of code using empty enums as opaque "type safe" FFI pointers. And of course the book suggests it as a technique as well. This might be tricky given Rust's backwards compatibility promise. The never type (i.e. !) may force this issue to some resolution on its way to becoming stable (unless that's already the case and I missed the discussion).

Can you point me to the second edition book parts where this is still suggested?

The empty-array-struct is currently the best version, until we have proper extern types. Empty enums are now discouraged (yes, they used to be recommended), after people found out that they have nasty behavior even in compiling code and some operations on them are not properly defined. This includes dereferences and especially match:

Can you point me to the second edition book parts where this is still suggested?

I didn't even find a chapter about FFI in the second edition. I think it was considered too advanced and fell off, left for the nomicon, where it looks like a copy of the first-edition FFI chapter.

Ah, sorry, I didn't check if the PR was merged already. The nomicon is wrong there.

https://github.com/rust-lang-nursery/nomicon/pull/44

@skade, thanks for the links. Ok, so this is discouraged now. The discussion in those links centers around reading and matching on empty enums which I can see as being dubious to begin with. I didn’t see anything that says raw pointers to these types are invalid.

I think this stance needs to be more pronounced, to the point where the 1st version of the book needs to be updated to have a note saying this practice is discouraged. Just my $.02

EDIT: a clippy lint against this might be useful as well

Note that the never_type feature at least adds a warning to your example code (playground link) that implies it's impossible to call foo even with an FFI generated value.

Also, attempting to replicate what I believe is a valid C FFI like generation of a reference to the type, gives a runtime crash, either SIGILL on nightly or an abort on stable.

I see an unreachable pattern warning on the println! there, which is good, but nothing about foo being uncallable. If I comment out the println! and make Empty and foo pub, then there are zero warnings: Playground

I don't think that's valid - it's transmuting a bogus value into a reference to the enum. The FFI usage would be to form a *const Empty (or *mut Empty) and receive from/pass that to FFI code. This is essentially a "type safe" *const/mut c_void. No Rust code would be matching or reading through that pointer - it's just a handle, so to speak.

I don't think that the overall code is valid, but if having a reference to an Empty is valid, then that code is the equivalent of linking against some C code that returns an opaque pointer to an int32_t. Since it crashes that seems to imply that &Empty should not be a valid inhabitable type.

I guess the actual point of invalidity in that code is the return type of &'static Empty; if it was *const Empty then converting that to an &Empty requires an unsafe block, and it's the programmers fault for doing that.

Right, so part of the question (at least in my mind) is whether having &Empty is valid to begin with. And if so, what does that mean given it's uninhabited? To get such a type you'd need unsafe code but no matter what you'll need to fabricate something that is invalid since Empty has no valid values.

But, this is really outside of FFI usecases that I'm aware of since the FFI usage just roundtrips a *const Empty through FFI, but otherwise does not read/write this value in Rust, nor form any references to it. FFI code returns a *const Empty (or *mut, but that's not material to the discussion) representing a raw pointer to some data it allocated and manages. Rust code stores that raw pointer, and then passes it into other FFI functions that expect to receive a pointer back to the opaque type - the foreign code then reads/writes the value as it sees fit.

But, this is really outside of FFI usecases that I’m aware of since the FFI usage just roundtrips a *const Empty through FFI, but otherwise does not read/write this value in Rust, nor form any references to it. FFI code returns a *const Empty (or *mut, but that’s not material to the discussion) representing a raw pointer to some data it allocated and manages. Rust code stores that raw pointer, and then passes it into other FFI functions that expect to receive a pointer back to the opaque type - the foreign code then reads/writes the value as it sees fit.

That's how you would have it if the compiler didn't optimise. However, I could see following compiler's reasoning (for example):

  • There must never be any instance of Empty in memory.
  • A *const Empty must either point to an instance of Empty or be null.
  • These two combined, *const Empty parameter I got has only one legal value, null. Therefore, it'll always be null. Good, I can do this crazy optimisation, like throwing half of the code away and not storing everything into that struct, because then when I read it out again and pass it to some other FFI function, it'll have to be null again.

So, I think the question boils down to if (unlike references) having or passing (not dereferencing) a pointer pointing to invalid data is OK or not.

Raw pointers don't guarantee that they point to a valid instance of a type when non-null (see the Unsafe Rust chapter in the second book). It's already considered safe to conjure up a raw pointer from the ether to any type at all and pass it around. The only unsafe operation is dereferencing it. So having a raw pointer to an empty enum seems "fine" in the same sense that it's "fine" to have a raw pointer to anything else. The only difference is that pointers to other types can be valid sometimes.

Meanwhile references guarantee that they'll both be non-null and referring to a valid instance of a type. But since there can never be a valid instance of an empty enum to refer to, the only way to get a reference to one is to lie to the type system via unsafe code.

Looking at the links that were posted earlier, it seems there's still some debate over whether creating a reference to an uninhabited type is itself UB, or if it's only UB to try using it after having created it. I personally am not sure how much the distinction matters, given that creating a reference to such a type involves the same unsafe processes by which one could create an invalid reference to any other type.

1 Like

My $.02 is that in an ideal world, a reference to an uninhabited type wouldn’t be a valid type - compile time error; same for raw pointers. I think part of the issue is that references/pointers to them are a type and a value, whereas only the type part makes sense. The FFI usecase is more of a type thing - it’s just a "type"-ier version of *const c_void but is otherwise opaque to Rust and no value is read/written to it from Rust code; it’s there for the type system only - to make sure you don’t mix up different c_void pointers when calling into FFI.

When it comes to raw pointers, I don't see why that one specific case of invalidity would be privileged over any other kind. Raw pointers can already be whatever they want to be. Having an invalid *const i32 is already perfectly "fine" in the sense that you can create it, carry it around, pass it to other functions, etc. It's only when you deref it that you could get into Trouble. The same is true for any other *const T that you can think of. Why would an invalid *const ! be treated any differently?

As for references, I suppose there is a stronger case for making it a compilation error. But at the same time, it's already understood that safe code fundamentally can't guard against unsafe code failing to uphold required invariants (such as that a reference must be non-null and must point to a valid instance of a type).

For example, it's not a compile error to create an invalid *const i32 and then unsafely cast that pointer to a &i32. Safe code has to trust that the unsafe code really did create a valid reference no matter what, and it's not the safe code that's at fault when you try to dereference it and things go wrong.

Basically, the only real difference between unsafely creating a &i32 and a &! is that it is possible for unsafe code to manually uphold the required invariants for validly creating the former, but it can never do the same when creating the latter. So with that in mind, maybe it would make sense to forbid it statically, but IIRC that might end up stomping over certain cases of generic code generation.

1 Like

The difference is that *const i32 has a chance of being valid, whereas *const Uninhabited has zero chance of being valid. I don't see the benefit of allowing fabrication of non-sensical types that leave you open to UB later on - let's just prevent that at the root. We cannot prevent all UB, but when we know we're dealing with a type-system only concept (i.e. no data), it'd be nice to take advantage of that.

My thoughts are the same as with respect to raw pointers above - &Uninhabited cannot possibly point at any valid value.

Yup, exactly.

Given you cannot form a &Uninhabited without unsafe code, then generic code would need to do the same thing to end up with one of those. I don't think generic code makes this any different. But perhaps there's some subtlety I'm missing.

You don't want this to be a compile-time error because of generics and macros. A Vec<!> mentions &! all over the place, and you want it to compile in generic contexts so that error handling that happens to bet instantiated with a Result<Foo, !> is fine, for example.

3 Likes

! is a special magical type to the compiler already, isn't it? As in, it can stand-in for any other type, implement traits automatically (right?), and so on. To that end, you can special case it further for &! or *const !.

Relatedly, what happens if generic code, after monomorphization, attempts to deref a &!? Is that considered dead code? Something else?