Wrapping FFI struct with reference counting


#1

I have C library with functions like

Foo* foo_create(…);
void foo_addref(Foo*);
void foo_release(Foo*);

char* foo_get_name(Foo* foo);
void foo_set_name(Foo* foo, char* name);

Naive pointer wrapping will lead to aliasing and violation of „only one mutable reference“ rule.
How should I wrap it in Rust safely? Are there examples / crates / best practices?


#2

Why? There’s never two &mut pointing to the same address. There’s only two &mut pointing to an object that contains the same pointer.

enum CFoo {}
pub struct Foo(*mut CFoo);

extern {
    fn foo_create() -> *mut CFoo;
    fn foo_release(foo: *mut CFoo);
    fn foo_addref(foo: *mut CFoo);

    fn foo_get_name(foo: *mut CFoo) -> *const c_char;
    fn foo_set_name(foo: *mut CFoo, name: *const c_char);
}

impl Foo {
    fn new() -> Self { Foo(unsafe { foo_create() }) }
    fn name(&self) -> *const c_char {
        unsafe {
            foo_get_name(self.0)
        }
    }
    fn set_name(&mut self, name: *const c_char) {
        unsafe {
            foo_set_name(self.0, name);
        }
    }
}

impl Clone for Foo {
    fn clone(&self) -> Self {
        unsafe {
            foo_addref(self.0);
        }
        Foo(self.0)
    }
}

impl Drop for Foo {
    fn drop(&mut self) {
        unsafe {
            foo_release(self.0)
        }
    }
}

#3

fn set_name(&mut self, ...) {

This &mut means "I have a unique reference to an aliased pointer to CFoo". What useful properties does it have?


#4

[quote=“kriomant, post:1, topic:5016”]
Naive pointer wrapping will lead to aliasing and violation of „only one mutable reference“ rule.
[/quote]If Foo is opaque or at least you never borrow any of its fields there should not be any references to speak of. Since the library supports reference counting I think it’s reasonable to assume the data races issue is solved there in some way (or else run away from it). To borrow the above example,

// You don't want pointers to empty types
type CFoo = libc::c_void;

impl Foo {
    fn set_name(&self, name: *const c_char) {
        unsafe {
            foo_set_name(self.0, name);
        }
    }
}

#5

if it was legal to call in C without UB, then it’s still legal to call from Rust.

The &mut is just to make sure this doesn’t behave like a Cell


#6

[quote=“oli_obk, post:5, topic:5016”]
The &mut is just to make sure this doesn’t behave like a Cell
[/quote]Can you be more specific?


#7

You would be able to modify a Foo object, even though you only have an immutable reference… Which I just noticed is still possible, because you can clone it and modify the clone… But i think it’s ok, because you never have a reference and a mutable reference to a value, because the name function returns a value, not a reference.


#8

Exposing c_char doesn’t seem to be “safe wrapping”, so consider name is defined as fn name(&self) -> &str.

Now look:

let a = Foo::new();
let mut b = a.clone();
// `a` and `b` share the same pointer
let name: &str = a.name();
b.set_name("new_name");
println!("a.name: {}", name); // Oops, `name` is invalid now, because
                              // underlying string was possibly deallocated
                              // and new one was allocated

#9

This proves you can’t borrow from that object and need to always copy data, e.g.

fn name(&self) -> String {
    unsafe {
        let ret = CStr::from_ptr(foo_get_name(self.0));
        String::from(ret.to_str().unwrap())
    }
}

We can’t know exactly what this getter should be doing because foo_get_name's contract is not specified.


#10

Always copying data is obvious solution, but it’s quite expensive and isn’t actually needed most of the time.
I thought there is some other way.

Anyway, thanks.


#11

Your C object looks a lot like an Rc<...>, so the Rustic thing to do is to sort of mimic the interface thereof.

I am assuming that your C library copies the string when set and returns a reference on get. This is the only option that is trivially thread-safe and leak-free; copying on get would either leak memory or violate thread safety (if it copied into a static buffer), while not copying on set would require the embedder to manage name memory, which is often considered antisocial, especially since foo_set_name does not return the old pointer, so the caller cannot free it without tracking. If your library copies in a different combination, you will need to adjust the example.

#![feature(unsafe_no_drop_flag)]
#![feature(libc)]
use std::ptr;
use std::ffi::{CStr,CString};
extern crate libc;
use libc::{c_void,c_char};
extern {
    fn foo_create() -> *mut c_void;
    fn foo_addref(foo: *mut c_void);
    fn foo_release(foo: *mut c_void);
    fn foo_get_name(foo: *mut c_void) -> *mut c_char;
    fn foo_set_name(foo: *mut c_void, name: *mut c_char);
}

// optional, saves a word at the expense of allowing drop to be called twice
#[unsafe_no_drop_flag]
// type-safe wrapper
struct Foo(*mut c_void);

impl Drop for Foo {
    fn drop(&mut self) {
        if self.0.is_null() { return; } // only needed with unsafe_no_drop_flag
        unsafe { foo_release(self.0); }
        self.0 = ptr::null_mut(); // unsafe_no_drop_flag
    }
}

impl Clone for Foo {
    fn clone(&self) -> Foo {
        unsafe { foo_addref(self.0) };
        Foo(self.0)
    }
}

impl Foo {
    fn new() -> Foo {
        unsafe { Foo(foo_create()) }
    }
    // &mut self would not be useful because we have rc-semantics
    // &mut really means "unique" and with the clone above, there can be many
    // references even if the pointer seems unique
    fn set_name(&self, name: &CStr) {
        // We are assuming that your library copies the name on set - see above
        unsafe { foo_set_name(self.0, name.as_ptr() as *mut c_char); }
    }

    fn get_name(&self) -> CString {
        // The to_owned copies the string.  We have to do this, because someone might set the name
        // immediately afterward.  It's not good enough to use &mut, because of RC cloning.
        // We don't have to worry about threads because the `Foo` type lacks the `Send` and `Sync`
        // traits, so it cannot be shared between threads.  The interface you gave does not support
        // setting the name from multiple threads anyway (what would the lifetime of the get_name
        // return be?)
        unsafe { CStr::from_ptr(foo_get_name(self.0)).to_owned() }
    }
}

Incidentally, you can avoid most refcount operations by passing &Foo around to functions instead of cloning a new Foo for every call.

Now, you complained (I think) about not being able to access the name without copying. The reason we need to copy the name is that we can’t guarantee nobody will write the name within a scope; but we could make that guarantee with additional kinds of pointer:

// Invariant: a UniqueFoo wraps a Foo with a reference count of 1
// thus, we do _not_ derive(Clone)
struct UniqueFoo(Foo);

impl UniqueFoo {
    fn new() -> UniqueFoo {
        UniqueFoo(Foo::new())
    }

    // Here it _is_ useful to use &mut, because we guarantee that
    // UniqueFoo cannot alias inside itself, but we still need to avoid aliased
    // refs _to_ UniqueFoo.
    fn set_name(&mut self, name: &CStr) {
        self.0.set_name(name);
    }

    fn get_name(&self) -> &CStr {
        // The borrow checker prevents any other access to this UniqueFoo
        // while the CStr lives (CStr has an inferred lifetime argument), and
        // our invariants prevent any access from other Foo instances, so
        // the string will live as long as it needs to!
        unsafe { CStr::from_ptr(foo_get_name(self.0 .0)) }
    }

    // Once you've done this, you can clone it, so we can never enforce our invariant again.
    // Take self by value - there's no going back.
    fn into_rc(self) -> Foo {
        self.0
    }
}

You could get significantly fancier if you needed to, e.g. clonable immutable references that could be read from but not coexist with writers.


#12

Wow! Thank you, this is very useful.

This allows me to use copy-less methods when I just created an object and so I’m sure I’m the only owner. This seems enough for my aims.

However, just out of curiosity, is there way to use copy-less methods for references got from C code?

API has methods like Foo* bar_get_foo(Bar*) which returns reference to Foo without increasing it’s reference count. Returned Foo can have any reference count, so anyone can modify it. But C code uses convention that value returned by foo_get_name is valid until I call mutable method on any other object.

Can it be expressed in Rust somehow? May be in runtime, if not compile time?


#13

What you have now is the moral equivalent of a &'a Rc<Foo>; you can dereference it, but you can only use it in a limited scope unless you clone it yourself. So you could add a FooBorrow<'a> smart pointer, which would mostly behave like Foo but would not have a Drop impl and would contain a PhantomRef<&'a ()> field to tell the compiler that it has the nature of an immutable borrow. Naturally this increases the complexity of your binding; there’s a tradeoff with ergonomics (and the Deref trait can be used to simplify a bit).