Ergonomic FFI design: moving object from dereferenced raw pointer


#1

I’m trying to wrap a C-api into Rust. It’s an astronomy data format library cfitsio.

Encapsulating the file state is a fitsfile struct, which is managed by the open and close functions:

fitsfile *fptr; // struct holding file position etc.
int status = 0; // how errors are reported

// Open a file on disk
fits_open_file(&fptr, "filename.fits", READONLY, &status);

// Close the file
fits_close_file(fptr, &status);

The fits_open_file function allocates the memory, and sets the given pointer to the allocated memory. I have started to wrap this pointer object, but want to use Rust’s borrowing semantics to manage mutability of this state object, hopefully allowing concurrent file access (which is not supported in the C-api).

My WIP wrapper library wraps this fitsfile into a rustier FitsFile (source) object, and stores a *const fitsfile member. It’s const because the pointer itself does not change. My issue is that the memory it points to does change.

Currently using the FitsFile object does not require it to be defined as mut which is mis-representing how the library works.

What I’d like to do is store the fitsfile object itself. It’s not defined as Copy so move semantics should be in play, but Rust does not allow moving from dereferenced raw pointers.

One reasonable alternative is to ensure that any methods added to this struct (or other relevant state structs) accept &mut self. Is this typically done in ffi code?

Yet another alternative is to design an API that does not reflect the C api, but is more typical of Rust APIs, but this needs more thought. I partly feel that the end user should not need to worry about the in-memory representation of the file state changing, but only when the user wants to change the file. I may consider a further high level API wrapping the current one which is more user-friendly.

This is my first serious Rust project, and I’d be interested to hear any comments.


#2

Your wrapper seems OK, because conceptually FitsFile owns the underlying C data — but make that pointer private! If the pointer can’t be taken out of the FitsFile, you’ll have the right ownership semantics ensured by lifetime of the wrapper object.

You can change the inner pointer to *mut if you want. Rust won’t get in the way, because it’s all unsafe anyway.

In case the C library saves any pointers to Rust data, or you borrow pointers from the C library, you will probably want PhantomData, which is a 0-sized Rust object that pretends to store whatever you tell it to store. This enables all usual lifetime checking.

It’s also possible to use & and &mut in extern C function defintions. They’re binary-compatible with C pointers, so if the C function behaves as if it was borrowing a reference, you can express that on Rust’s side too. But it’s not commonly used, because most of the time you’ll want bindgen to generate it all for you.


#3

I agree that storing a raw pointer is the right solution here, since the C library expects you to pass the same pointer back to ffclos.

One reasonable alternative is to ensure that any methods added to this struct (or other relevant state structs) accept &mut self. Is this typically done in ffi code?

Yes, and this is what I recommend. You could make exceptions for methods that are definitely safe to call on a possibly-aliased pointer (though for some C libraries it is hard to be 100% certain which methods these are).

But to answer the general question about moving out of raw pointers, ptr::read is one way to do this. Of course, the Rust compiler can’t enforce the move semantics here like it can for other types, so it’s up to the programmer to maintain any ownership invariants.


#4

tl;dr: I think you’re going to be better off letting Rust have a pointer to the fitsfile, and letting your cfitsio library manage allocation, initialization, and destruction.

I imagine what you have in mind here is a Rust FitsFile type that is the same size as the C fitsfile struct, and you’d do something like:

extern {
    fn fits_open_file(ff: *mut *mut FitsFile,
                      filename: *const libc::c_char,
                      mode: libc::c_int,
                      status: *mut libc::c_int);
}

let filename = CString::new("filename.fits").unwrap();
let mut fptr: *mut FitsFile = std::ptr::null_mut();
let mut status = 0;
let f: FitsFile;
unsafe {
    fits_open_file(&mut fptr, filename.as_ptr(), READONLY, &mut status);
    //... check status ...
    f = std::ptr::read(fptr);
    //... what to do with memory fptr points at?
}
let _g = f;

(link to the full mock code: https://is.gd/LB5Hpo)

As @mbrubeck suggests, this uses std::ptr::read to get a FitsFile from a *const FitsFile. This FitsFile then becomes something that Rust will move from place to place until it’s dropped.

There are three concerns that come to mind, though:

  • In general, moving a value copies it to a new address. (Optimization can avoid many of these copies, but not all, so you must assume they will occur.) This includes the move of std::ptr::read's return value to its caller. If there are other structures with internal pointers back to the fitsfile structure, those are going to be pointing to the fitsfile that fits_open_file allocated, not the FitsFile that std::ptr::read returns.

  • Even if that’s not a problem, you say that fits_open_file allocates memory for the fitsfile and sets its first argument to point to that memory. std::ptr::read(fptr) moves fptr's referent into its return value, but the memory fptr points at remains allocated, even though the resources it used to own now belong to the memory read returned. So you need to free the former, without freeing any of the resources the latter now owns. Given that cfitsio doesn’t really promise anything about how it allocated it, freeing it yourself is dicey.

  • You certainly can’t pass fits_close_file a pointer to a FitsFile whose storage Rust is managing: cfitsio will free it, and then Rust will try to free it too. You’ll need to move the FitsFile back into storage that is safe for fits_close_file to operate on, and then use something like std::mem::forget to make Rust chill out. But this is dicey in the same way as the std::ptr::read call: it requires assumptions about how cfitsio allocates memory that probably aren’t part of the library’s interface.

If you take the other approach instead, and make FitsFile hold a *mut cfitsio::fitsfile, then there’s no need to free the block that fits_open_file allocates; moves of the FitsFile don’t affect the underlying fitsfile; and you can implement Drop to clean it up with a single call to fits_close_file. It seems like everything lines up much better.

It is true that if you have:

struct FitsFile(*mut cfitsio::fitsfile);

then given a shared reference &FitsFile you can still mutate the cfitsio::fitsfile it points to, which feels weird. But as long as FitsFile is a pub type with a non-pub element, only your module will ever be able to get to the *mut cfitsio::fitsfile, so you could certainly offer only non-mutating methods on &FitsFile values.


#5

*const means that the data pointed to by the raw pointer does not change, not that the pointer itself does not change. If the data behind the pointer does change then you want *mut.


#6

Thanks for your replies, I’ve learnt quite a lot just from this discussion.

I think I will have the fitsifle struct private, and any manipulations that change the underlying file will take self as &mut self indicating that mutation is occurring. I still have concerns about implementation details leaking into the high level user code, as changing the current hdu changes the underlying representation and such may cause concurrency issues if I ever get it to the stage of making the code threadsafe. I feel that is a decision I need to make then.