FFI - C FILE* and good Rust wrapper equivalent type

Hello,

i'm playing with creation of _sys crate and additional wrapper crate above for a C library... (lots of fun)


problem

I'm struggling with following pattern, and how to map it inside wrapper crate code:

C code:
    void mylib_init(FILE *estream);

the C library is to be initialized with FILE* for debug messages printouts etc. It can use e.g. stdout / stderr / regular file-system file, and i'd like to keep the same ability in wrapper...


Research

Many great tutorials and tips for FFI all around, but not many cleanly address patterns in full sequence - comparison of typical Rust "high level" API structs vs low level FFI structs, and how they can represent each other.

e.g. in topically "related' thread, i've learned that impl Write can do similarly (mixing usage of stdout/file) in Rust higher layer code. Thread does not explicitly address possibility to "cast" such items into raw file descriptor though...


Attempt

For my case the std::fs::File feels like a good match for input parameter of Rust FFI wrapper, and it's voyage towards FILE*... From some tips and tutorials, i've come to

Moving ownership of File should be "fine" due to definition of C API.

use std::os::unix::prelude::*;
// or use explicitly `std::os::unix::io::AsRawFd`

pub fn mylib_wrapper_init<F: AsRawFd>(output_stream: F) {
    let raw_fd = output_stream.as_raw_fd();
    let mode = ???;
    let estream = libc::fdopen(raw_fd, mode);
    unsafe { syscrate::mylib_init(estream); }
}

Is there a way to obtain std::fs::OpenOptions form the file object, or do i need to pass it extra?
Or is there a better API type to be used instead of File?


Big question

  • Is the std::fs::File a good type for low-level FFI function?

    • obviously is good for regular file representation
    • can it be used/obtained from e.g. stdout?
  • does impl Write imply ability to cast it to (retrieve from it) file descriptor needed for C FFI end?

  • or is there some type that can adddress both these requirements?

or am i complicating things unnecessarily, and i should just use something like:

enum LibOutput {
  File(std::fs::File),
  StdOutput,
  StdError,
}

and

  • use raw fd magic for File variant only,
  • do stdout/err variants in C's FFI using C macros only, instead of getting correct FD from Rust code all the time

You mentioned you are creating a -sys crate.

As far as I understand such crates are low-level and should map 1:1 to the C API. FILE*’s equivalent in Rust is *mut libc::FILE, I believe.

yes of course, _sys crate is almost 100% automated and already "done" (open for edits as needed of course).

This all above is regarding the "wrapper crate" i am now building above it, and that should not use any FFI level data structures for northbound API, thus my search for something Rusty to map downards to FILE*...

edit: edited my original post intro to make it more clear, sorry for original hidden info

You can't create a C FILE from a Rust io::Read / io::Write, nor the other way around. You especially can't cast FILE * to *mut Write or vice versa. You can't therefore wrap a C API taking FILE * as a Rust equivalent with File or generic writers.

(There is one exception, if your operating system allows for creating a FILE * from a memory buffer or from dedicated reader/writer callbacks, then you can hook those up through FFI, but last time I checked, this was not portable at all.)

1 Like

That seems like the most sensible solution to me; especially given the fact that on unix, you have /dev/stdout and /dev/stderr available to easily special case StdOutput and StdError as Files too.


But if you want to keep the fd idea since you also plan on working with sockets and things like that, then use F : AsRawFd. That being said, be very careful with how the instance f: F is managed, w.r.t. ownership.

For instance:

So you either take a &'long_enough F and hope / pray that 'long_enough represents a lifetime that outlives the usage of syscrate (in which case the API will surely need to be unsafe), or you use a callback pattern or ownership + Drop semantics (RAII) to guarantee that:

fn my_lib_wrapper_init (output_stream: impl 'static + AsRawFd)
  -> SysCrate
{
    let output_stream: Box<dyn AsRawFd + 'static> = Box::new(output_stream);
    let fd = output_stream.as_raw_fd();
    unsafe {
        ::sys_crate::mylib_init(libc::fdopen(fd, mode));
        impl Drop for SysCrate {
            fn drop (self: &'_ mut SysCrate)
            {
                unsafe {
                    ::sys_crate::mylib_deinit();
                    // ::libc::fdclose(fd); /* I don't think this is a good idea */
                }
            } // output_stream is dropped, closing the file.
        }
    }
    SysCrate {
        output_stream,
        // …
    }
}
2 Likes

Thank you all for responses and clarifications.

Recommendations of fdopen() in various threads/topics across the web sounded a bit strange to me originally, but i didn't puzzle together all the implications and issue with fd close... :man_facepalming:
Thanks a lot @Yandros for the example!

For now i'll use the stdin/stderr enum variants which are sufficient for PoC implementation, and will maybe add extra FFI API to handle "file" opening/closing in C code side to skip skip some cross languages issues... (rust input -> fs file name only, c -> all the fd magic -> Rust - some boxed pointer/fd for passing to other wrapper methods)

It sounds like you are wanting to pass around some sort of trait object to get the nice bonuses from polymorphism, but it needs to be FFI-safe. In normal Rust you'd use a Box<dyn Write> and be done with it, but C doesn't know how to use Rust's fat pointers.

In your scenario you've already got an equivalent mechanism in the form of Linux file descriptors, but if that wasn't available the general solution I'd use is that of a "thin" trait object. You could also use a *mut Box<dyn Write> (a pointer to a pointer to something implementing std::io::Write), but the double indirection plus dynamic dispatch could be prohibitive.

The idea behind thin trait objects is you'll store the object's vtable right next to its data, then pass around a pointer to the vtable + data. This is quite similar to how COM and GObject work, and a less complicated form of how C++ achieves dynamic dispatch while still passing around normal pointers.

#[repr(C)]
struct Repr<W> {
    vtable: VTable,
    // Safety: Should only ever be accessed using vtable methods.
    data: W,
}

#[derive(Copy, Clone)]
#[repr(C)]
struct VTable {
    write: unsafe fn(*mut Repr<()>, &[u8]) -> Result<usize, Error>,
    flush: unsafe fn(*mut Repr<()>) -> Result<(), Error>,
    destroy: unsafe fn(*mut Repr<()>),
    type_id: TypeId, // enables downcasting.
}

From there I'd create a FileHandle type which wraps Repr, pretends that the W type is (), and implements our desired functionality (in this case, std::io::Write) via the vtable functions.

pub struct FileHandle {
    // Safety: you can only mutate this via `vtable` methods. No touching the
    // wrapped `data`!
    repr: Repr<()>,
}

impl Write for FileHandle {
    fn write(&mut self, buffer: &[u8]) -> Result<usize, Error> {
        unsafe { (self.repr.vtable.write)(&mut self.repr, buffer) }
    }

    fn flush(&mut self) -> Result<(), Error> {
        unsafe { (self.repr.vtable.flush)(&mut self.repr) }
    }
}

We also implement Drop to make sure the wrapped data is destroyed properly.

impl Drop for FileHandle {
    fn drop(&mut self) {
        unsafe {
            (self.repr.vtable.destroy)(&mut self.repr);
        }
    }
}

The way FileHandle gets constructed is key to the safety of the entire system. One of our invariants is that FileHandle must be behind a pointer at all times because its true size is unknown at runtime. That means you'll only see variables of type Box<FileHandle> or &mut FileHandle.

impl FileHandle {
    pub fn for_writer<W: Write + 'static>(writer: W) -> Box<Self> {
        let repr = Repr {
            vtable: VTable::new::<W>(),
            data: writer,
        };
    }
}

You can populate the VTable using "trampoline" functions which get instantiated for each W type. See Rust Closures in FFI for more on this trampoline function technique.

impl VTable {
    fn new<T: Write + 'static>() -> Self {
        // Create some wrapper functions which will cast to a `T` and call into
        // the corresponding method on std::io::Write.

        unsafe fn write_trampoline<T: Write>(
            repr: *mut Repr<()>,
            buffer: &[u8],
        ) -> Result<usize, Error> {
            as_mut_unchecked::<T>(repr).write(buffer)
        }
        unsafe fn flush_trampoline<T: Write>(repr: *mut Repr<()>) -> Result<(), Error> {
            as_mut_unchecked::<T>(repr).flush()
        }
        unsafe fn destroy_trampoline<T>(repr: *mut Repr<()>) {
            std::ptr::drop_in_place(as_mut_unchecked::<T>(repr));
        }

        VTable {
            write: write_trampoline::<T>,
            flush: flush_trampoline::<T>,
            destroy: destroy_trampoline::<T>,
            type_id: TypeId::of::<T>(),
        }
    }
}

/// For convenience, blindly get a reference to the `data` inside a `Repr<()>` 
/// and cast it to a `&mut T`.
///
/// # Safety
///
/// The `repr` argument must actually point to a `Repr<T>`.
///
/// The `'a` lifetime must not outlive the `repr`. We're conjuring up an
/// arbitrary lifetime because it makes things more ergonomic.
unsafe fn as_mut_unchecked<'a, T>(repr: *mut Repr<()>) -> &'a mut T {
    &mut *(&mut (*repr).data as *mut _ as *mut T)
}

And now you can create your mylib_init() and a bunch of functions which will construct appropriate *mut FileHandles.

#[no_mangle]
pub unsafe extern "C" fn mylib_init(handle: *mut FileHandle) {
    // TODO: Initialise the logger with our file handle.
}

#[no_mangle]
pub unsafe extern "C" fn file_handle_from_path(filename: *const c_char) -> *mut FileHandle {
    let filename = CStr::from_ptr(filename);
    let filename = match filename.to_str() {
        Ok(f) => f,
        Err(_) => return std::ptr::null_mut(),
    };

    let writer = match File::create(filename) {
        Ok(w) => w,
        Err(_) => return std::ptr::null_mut(),
    };

    let handle = FileHandle::for_writer(writer);
    Box::into_raw(handle)
}

I've written up the whole thing on the playground.

Please let me know if you think there are logic errors or the code is unsound. I'm thinking of writing up an article explaining this technique in more detail if there is some interest.

I can't claim much credit for this idea. I first stumbled upon it when doing an informal audit of the anyhow crate (my Repr is anyhow's ErrorImpl).

2 Likes

In general, I strongly advise against type punning on the pointee, while keeping Rust high level pointers around:

  • it is unsound to transmute a &mut Repr<()> to a &mut Repr<SomethingNot0SizedOrNot1Aligned>,

    • Ditto for the other high level pointers (e.g., & _, Box<_>)

    • More precisely: it is unsound to dereference any pointer that was created from casting to a Repr<T> a pointer that originated from a &[mut] Repr<()> . See pointer provenance and the following issue.

      So it's at least a safety invariant. It may even be a validity invariant, but that part is not settled yet (see also the issue for Box).

In practice, thus, it is best to hard-code the pointer type (in this example, Box, inside the wrapper type (FileHandle), and only deal with raw pointers from that moment onwards:

pub
struct BoxedFileHandle {
    ptr: ptr::NonNull<Repr<()>>, // represents a "Box<Repr<()>>",
}

impl BoxedFileHandle {
    pub
    fn for_writer<W : Write + 'static> (writer: W)
      -> BoxedFileHandle
    {
        let repr: Repr<W> = Repr { vtable: ..., data: writer };
        Self {
            ptr: ptr::NonNull::from(Box::leak(Box::new(repr))).cast(),
        }
    }
}

impl Drop for BoxedFileHandle {
    fn drop (self: &'_ mut BoxedFileHandle)
    {
        let VTable { destroy, .. } = self.vtable();
        unsafe { destroy(self.ptr); } // assuming `destroy` also deallocates the Box
        /* destroy would be `drop::<Box<Repr<OriginalType>>>(Box::from_raw( _ ))` */
    }
}

impl BoxedFileHandle {
    #[inline]
    fn vtable (self: &'_ BoxedFileHandle)
      -> VTable
    {
        // This is the only time we create a Rust high-level pointer
        // to a `Repr<()>`. Although error-prone, here this is fine, because:
        //   - such high level pointer no longer exists by the time the function returns
        //   - Reading a `Repr<()>` out of a `Repr<W>` is well-defined (thanks to `#[repr(C)]`). 
        unsafe { self.ptr.as_ref().vtable }
    }

    pub
    fn into_ffi (self: BoxedFileHandle)
      -> *mut ::core::ffi::c_void
    {
        self.ptr.cast().as_ptr()
    }

    pub
    unsafe
    fn from_ffi (ptr: *mut ::core::ffi::c_void)
      -> BoxedFileHandle
    {
        Self { ptr: ptr::NonNull::new(ptr.cast()).expect("Got NULL") }
    }
}
1 Like

Thanks for running my code through Miri @Yandros and pointing out the parts that are unsound! This is exactly the kind of feedback I was looking for.

As you've surmised, the overall pattern we're trying to emulate is C-style inheritance using function pointer fields in the base "class" as a way to implement virtual methods. Then you can treat a *mut Child as a *mut Parent... I know this works in practice, and have seen similar things being done in Rust (e.g. anyhow::Error), but don't know how it interacts with the language's memory model.

I was playing around with it locally and it seems like using std::alloc::alloc() directly then initializing with ptr.write(Repr { ... }) makes Miri happy. So completely ripping out Box and normal Rust references ("Rust high level pointers") seems to do the job.

One thing that you'll really want to do is be able to pass around a &mut FileHandle (which would really be a &mut Repr<SomeWriter>) and call methods on it, instead of using a &mut BoxedFileHandle and incurring double indirection. So in this case type punning is desired for ergonomics.

Would punning a &mut Repr<SomeWriter> as a &mut FileHandle still be UB even though we're only doing unsafe pointer operations internally and never actually dereferencing the FileHandle reference? This feels quite similar to RalfJung's article on Why even unused data needs to be valid.

this discussion is a bit beyond my original requirements / case, but is imho very interesting and deserves continuation...

i have no problem continuing here but maybe creating new topic with topic name capturing this pattern/problem will bring in more insights/people to discussion. :slight_smile: (and will keep it easier to search for/find in future)

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.