Meddling with lifetime annotation and PhantomData

Hi, I'm writing Wrapper for a C API and I try to make a safe wrapper for a type that contain pointers by adding lifetime constraints.
For example I have the following type in C:

#[repr(C)]
#[derive(Copy, Clone)]
pub struct ast_statement {
    pub type_: ast_statement_type_t,
    pub __bindgen_anon_1: ast_statement__bindgen_ty_1,
}
#[repr(C)]
#[derive(Copy, Clone)]
pub union ast_statement__bindgen_ty_1 {
    pub rule: *const ast_rule_t,
    pub definition: *const ast_definition_t,
    _bindgen_union_align: u64,
}

My wrapper type should have the lifetime constraint that the object cannot outlive the rule to which it points.

#[derive(Clone)]
pub struct AstStatement<'a> {
    data: ast_statement_t,
    phantom: PhantomData<&'a u32>,
}

An AstStatement can be created from a Rule using the following function:

pub struct Rule(ast_rule);
impl Rule {
pub fn ast_statement(&self) -> AstStatement {
        AstStatement {
            data: ast_statement_t {
                type_: ast_statement_type_ast_statement_type_rule,
                __bindgen_anon_1: ast_statement__bindgen_ty_1 {
                    rule: &self.0 as *const ast_rule,
                },
            },
            phantom: PhantomData,
        }
}

I'm not sure wheather I'm doing it right. Please correct me if I'm doing something stupid.
Also I would like to know how i can test if the lifetime behavior is like expected.

Write a function that tries to use AstStatement after you drop(rule). The compiler should complain. As fast as having automated tests for this goes, a quick google search let me to compiletest-rs, but I don't know if thats what you would want to use today.

As far as the correctness goes, you've probably gone through the nomicon to figure out what type to use in PhantomData. Perhaps you need to specify what is "correct" for your library in order for anyone to comment on that. I'm not an expert by far.

Here is an example of the general pattern to use:

pub
struct SomeDerivedStruct<'opaque> {
    ffi: ptr::NonNull<ffi::some_derived_c_struct_t>,
    // represent the abstract property: it borrows SomeOpaqueType
    _lifetime: PhantomData<&'opaque SomeOpaqueType>,
}

impl SomeOpaqueType {
    pub
    fn derive_struct<'opaque> (
        self: &'opaque Self,
    ) -> Option< SomeDerivedStruct<'opaque> >
    {
        Some(SomeDerivedStruct {
            ffi: ptr::NonNull::new(
                unsafe {
                    ffi::mk_some_derived_c_struct_t(self.ffi.as_ptr())
                }
            )?,
            _lifetime: PhantomData,
        })
    }
}
  • This works, since the following program

    fn main ()
    {
        let opaque =
            SomeOpaqueType::new()
                .expect("Failed to create opaque type")
        ;
        let derived =
            opaque
                .derive_struct()
                .expect("Failed to create derived struct")
        ;
        drop(opaque);
        println!("{}", derived.a_field());
    }
    

    Fails with:

    error[E0505]: cannot move out of `opaque` because it is borrowed
       --> src/main.rs:115:10
        |
    111 |         opaque
        |         ------ borrow of `opaque` occurs here
    ...
    115 |     drop(opaque);
        |          ^^^^^^ move out of `opaque` occurs here
    116 |     println!("{}", derived.a_field());
        |                    ------- borrow later used here
    
  • Playground

Indeed the key lies within the derive_struct function: it lends / returns a SomeDerivedStruct that is at most valid for the lifetime 'opaque, where 'opaque is the name given to the lifetime of the borrow of Self = SomeOpaqueType:

impl SomeOpaqueType {
    pub
    fn derive_struct<'opaque> (
        self: &'opaque Self,
    ) -> Option< SomeDerivedStruct<'opaque> >

This connection between the input and output lifetime, achieved by using an explicitly named generic lifetime parameter, can also be achieved in a more concise way using lifetime elision. That is, the above code is equivalent to:

impl SomeOpaqueType {
    pub
    fn derive_struct (
        self: &'_ Self,
    ) -> Option< SomeDerivedStruct<'_> >

which, in turn, is equivalent to:

impl SomeOpaqueType {
    pub
    fn derive_struct (&self) -> Option<SomeDerivedStruct>
  • (I am not fond of this last syntax, since it is too implicit w.r.t. lifetimes and borrows).

Thus, all the borrowing magic involves "just" the signature of the function creating the "borrowing" struct, thanks to it "having a generic lifetime parameter".

And one way to achieve this as a zero-cost abstraction is by using PhantomData<type using the generic lifetime parameter>, e.g., PhantomData<&'lifetime SomeOtherType>. You can use SomeOtherType = (), but if the symbolic borrow involves a specific type, such as SomeOpaqueType in my example, I suggest using it in the PhantomData:

pub
struct SomeDerivedStruct<'lifetime_parameter> {
    ...
    // represent the abstract property: it borrows SomeOpaqueType
    _lifetime: PhantomData<&'lifetime_parameter SomeOpaqueType>,
}

Having automatic tests for code that should be prevented is indeed a very good practice; nowadays I recommend using ::trybuild: I find it very easy to use, as well as well designed.

2 Likes

Thank you for your answer. It took me some time to grasp and I have still some questions.
I think you assume some C function mk_opaque_type that creates the some_opaque_type_t on the heap and returns a pointer.
My C API does not provide such a function.
Must I create an equivalent function in Rust with Box?
And does this mean I cannot have the data on the stack?

No, you can perfectly have inline struct / data in the stack ! I just wanted to show a more general pattern with opaque structs and pointers, since they may be harder to use correctly. Here is the pattern adapted to inline structs:

pub
struct SomeDerivedStruct<'opaque> {
    ffi: ffi::some_derived_c_struct_t,
    // represent the abstract property: it borrows SomeConcreteInlineType
    _lifetime: PhantomData<&'opaque SomeConcreteInlineType>,
}

impl SomeDerivedStruct<'_> {
    pub
    fn a_field (self: &'_ Self)
      -> ::libc::c_int
    {
        self.ffi
            .a_field
    }
}

impl SomeConcreteInlineType {
    pub
    fn derive_struct<'opaque> (
        self: &'opaque Self,
    ) -> SomeDerivedStruct<'opaque>
    {
        SomeDerivedStruct {
            ffi: unsafe {
                ffi::mk_some_derived_c_struct_t(&self.ffi)
            },
            _lifetime: PhantomData,
        }
    }
}

Ok, now I see what my problem was.
Bindgen derived Copy for my C type and hence there was no borrowing.
This may sound stupid but I should not derive Copy for a struct that points to another struct which might be deleted?

Indeed, copying pointers to non 'static data can easily lead to use-after-free (borrowing pointer) or even double-free errors (owning pointer).