FFI C string input best practices with cbindgen: *const T, Option<NonNull<T>>, Option<&T>

I've been thinking about best practice for exposing Rust code via a C interface ("C calls Rust"). I'm using cbindgen to create a C header from a rust project. There are several ways to write the Rust side of the interface such that, from C, the user can pass a const char *.

In the examples below, I use a c_char, but the same question would apply for creating any "slice-like" type from a C pointer.

  1. *const c_char
    • pro: most straightforward way
    • con: must remember to "check for NULL"
  2. Option<NonNull<c_char>>
    • pro: can't forget NULL check (because of Option<T>)
    • con: forces use of mutable pointer in bindings (since NonNull<T> uses *mut T)
  3. Option<&c_char>
    • pro: can't forget NULL check (because of Option<T>)
    • con: it feels weird to construct a slice-like thing from just a pointer-like thing

As of writing, all methods are fine according to miri.

use std::{ffi::CStr, os::raw::c_char, ptr::NonNull};

/// Takes a `*const c_char`
#[no_mangle]
pub unsafe extern "C" fn str_length_ptr(s: *const c_char) -> usize {
    // Manually check for NULL case (easy to forget)
    if s.is_null() {
        return 0;
    }
    let s = CStr::from_ptr(s);
    s.to_bytes().len()
}

/// Takes a `Option<NonNull<c_char>>`
///
/// Since `NonNull<T>` acts as a `*mut T`, we don't get "const" in C bindings
#[no_mangle]
pub unsafe extern "C" fn str_length_nonnull(s: Option<NonNull<c_char>>) -> usize {
    // Since we have an Option, we can't forget to handle NULL
    let s = if let Some(s) = s {
        s.as_ptr()
    } else {
        return 0;
    };
    let s = CStr::from_ptr(s);
    s.to_bytes().len()
}

/// Takes a `Option<&c_char>`
#[no_mangle]
pub unsafe extern "C" fn str_length_option_ref(s: Option<&c_char>) -> usize {
    // Since we have an Option, we can't forget to handle NULL
    let s = if let Some(s) = s {
        s as *const c_char
    } else {
        return 0;
    };
    let s = CStr::from_ptr(s);
    s.to_bytes().len()
}

/// Pretend to be C
#[cfg(test)]
#[test]
fn pretend_to_be_c_code() {
    use std::mem::transmute;

    // SAFETY: C strings are NULL or NUL-terminated
    let null: *const c_char = std::ptr::null();
    let empty = b"\0".as_ptr() as *const c_char;
    let hi = b"hello\0".as_ptr() as *const c_char;

    let expectations = [(null, 0), (empty, 0), (hi, 5)];

    for (input, expected) in expectations {
        let acutal = unsafe { str_length_ptr(input) };
        assert_eq!(acutal, expected);

        let acutal = unsafe {
            // CAUTION: do not use transmute() in real code; used here to imitate C FFI boundary
            let input = transmute::<*const c_char, Option<NonNull<c_char>>>(input);
            str_length_nonnull(input)
        };
        assert_eq!(acutal, expected);

        let acutal = unsafe {
            // CAUTION: do not use transmute() in real code; used here to imitate C FFI boundary
            let input = transmute::<*const c_char, Option<&c_char>>(input);
            str_length_option_ref(input)
        };
        assert_eq!(acutal, expected);
    }
}

Creating bindings with cbindgen --lang c gives:

#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

/**
 * Takes a `*const c_char`
 */
uintptr_t str_length_ptr(const char *s);

/**
 * Takes a `Option<NonNull<c_char>>`
 *
 * Since `NonNull<T>` acts as a `*mut T`, we don't get "const" in C bindings
 */
uintptr_t str_length_nonnull(char *s);

/**
 * Takes a `Option<&c_char>`
 */
uintptr_t str_length_option_ref(const char *s);

What are folks' thoughts on the best practice?

I would say Option<NonNull<c_char>> is best, but it feels weird to take a reference to a single c_char and then infer that it should be treated as an array-like thing.

Updates

  1. Option 3 above is not valid, as shown by @Hyeonu, @H2CO3.
  2. I understand that transmute() should never be used in real code. I used transmut() because it most closely resembles what calling over an FFI interface actually does. I avoided calling the interface with actual C code in this example since (as of writing) Miri cannot handle code that traverses an FFI boundary. I added comments in the code above to clarify.

Nit: NonNull<T> doesn't use *mut T, it uses *const T so that it can be covariant.

Anyway, is there a reason you can't just make the binding const char *? I.e. it's just a contract you have to uphold in the body of your function.

See also the last paragraph before Implementations in the docs. You'd just be in the same situation as someone who used From<&c_char>. It's even ok to have a *mut as long as you don't create a &mut or otherwise mutate through it.

(Which is why in a more general sense, it's safe code to do &r as *const _ as *mut _: )

-        let acutal = unsafe {
-            let input = transmute::<*const c_char, Option<NonNull<c_char>>>(input);
+        let input = NonNull::new(input as *mut _);
+        let actual = unsafe {
            str_length_nonnull(input)
        };
1 Like

Currently tracking raw pointers in MIRI requires a nightly flag which is not supported on the playground. Try test it in your local machine with following command.

MIRIFLAGS="-Zmiri-track-raw-pointers" cargo +nightly miri test

And it complains on the option 3.

4 Likes

Also, please do not use transmute to convert between pointers; it's almost never correct.

As for option #3, it's incorrect because of provenance. Rust differentiates between refefences to single objects and references to slices. You are only allowed to access an object through a reference it points to. And a &T only points to a single object. Even if it happens to be pointing to the first element of a slice, you are not allowed to use it to access the rest of the elements, because it's assumed, by virtue of its type, to only point to a single element. This is important when optimizations rely on alias analysis.

2 Likes

Good point--NonNull<T> acts like a "*mut T but non-zero" but actually contains a *const T.

One reason I would like to avoid using a pointer directly is that something like an Option<T> provides little extra type safety--I can't forget to check for the "null" case.

Good to know regarding *mut T--I thought that casting *const T to a *mut T might be undefined behavior. Seems like it's okay as long as you do not mutate through a *mut T whose provenance is a "const" thing.

Good to know! This is the error I get for option 3:

$ MIRIFLAGS="-Zmiri-track-raw-pointers" cargo +nightly miri test
   Compiling null-c-str v0.1.0 (/home/user/workspace/rust-help/null-c-str)
    Finished test [unoptimized + debuginfo] target(s) in 0.04s
     Running unittests (target/miri/x86_64-unknown-linux-gnu/debug/deps/null_c_str-87d0d28ba6055ebe)

running 1 test
test pretend_to_be_c_code ... error: Undefined Behavior: no item granting read access to tag <177398> at alloc2+0x1 found in borrow stack.
    --> /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/ffi/c_str.rs:1174:23
     |
1174 |             let len = sys::strlen(ptr);
     |                       ^^^^^^^^^^^^^^^^ no item granting read access to tag <177398> at alloc2+0x1 found in borrow stack.
     |
     = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
     = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
             
     = note: inside `std::ffi::CStr::from_ptr` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/ffi/c_str.rs:1174:23
note: inside `str_length_option_ref` at src/lib.rs:38:13
    --> src/lib.rs:38:13
     |
38   |     let s = CStr::from_ptr(s);
     |             ^^^^^^^^^^^^^^^^^
note: inside `pretend_to_be_c_code` at src/lib.rs:68:13
    --> src/lib.rs:68:13
     |
68   |             str_length_option_ref(input)
     |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at src/lib.rs:45:1
    --> src/lib.rs:45:1
     |
44   |   #[test]
     |   ------- in this procedural macro expansion
45   | / fn pretend_to_be_c_code() {
46   | |     use std::mem::transmute;
47   | |
48   | |     // SAFETY: C strings are NULL or NUL-terminated
...    |
71   | |     }
72   | | }
     | |_^

This is exactly the kind of undefined behavior I was afraid option 3 might exercise. A reference to item does not imply I am allowed treat it as the beginning of an array (like in C).

Thanks for the advice regarding transmute(). I was trying to simulate what actually happens when the calling the Rust code over an FFI boundary. I believe the instances of transmute() are safe due to the nullable pointer optimization. My understanding is that NonNull<T> and &T are guaranteed to be non-null, an Option of either of them has the same memory layout as the original type.

Thanks for the pointer to the "provenance" chapter--it's nice to know that this is formalized somewhere (even if the document is not normative like a language standard). If if the transmute() for Option<&c_char> is safe, violating the provenance by constructing a CStr from the reference is not safe.

It seems like my gut feeling "it feels weird to construct a slice-like thing from just a pointer-like thing" was correct. :wink:

Of the original options, using a *const c_char seems like the best option, but does not provide the extra type safety ("don't forget to check for null"). What are thoughts on make a new type like NullableCharPtr that has the same memory layout as *const c_char but makes sure that you can only access as an Option<&CStr>?.

Miri does not complain.

#[derive(Clone, Copy, Debug)]
#[repr(transparent)]
pub struct NullableCharPtr {
    ptr: *const c_char,
}

impl NullableCharPtr {
    pub fn new(self, ptr: *const c_char) -> Self {
        Self { ptr }
    }

    pub unsafe fn as_option_ptr<'a>(self) -> Option<&'a CStr> {
        if self.ptr.is_null() {
            None
        } else {
            Some(CStr::from_ptr(self.ptr))
        }
    }
}

/// Takes a `NullableCharPtr`
#[no_mangle]
pub unsafe extern "C" fn str_length_nullable_char_ptr(s: NullableCharPtr) -> usize {
    // Since we have an Option, we can't forget to handle NULL
    if let Some(s) = s.as_option_ptr() {
        s.to_bytes().len()
    } else {
        0
    }
}
typedef const char *NullableCharPtr;

/**
 * Takes a `NullableCharPtr`
 */
uintptr_t str_length_nullable_char_ptr(NullableCharPtr s);

If it were me, I'd just accept a *const c_char and manually do the null checks + CStr::from_ptr() call at the top of my Rust function.

FFI functions should normally be a thin wrapper which just translates arguments and delegates to a safe Rust function anyway, so I'd argue the extra boilerplate is worth being able to understand everything at a glance instead of needing to mentally peel away layers of abstraction.

That approach pushes more work onto the developer, but I'm fine with that. I'd prefer either a fully manual unsafe function or a fully generated safe function (a la safer_ffi), not some mixture of unsafe code with tricky safe code (null pointer optimisation allowing an implicit transmute from *const c_char to Option<NonNull<c_char>>, NonNull<T> being essentially a *mut T, still needing to use CStr, etc.).

They might be safe. They are just terrible style, because they are dangerous.

Don't simulate it, then. If you are interested in using FFI, use FFI. There are a bunch of FFI-safe types of which the layout and the compatibility with C is documented and guaranteed by Rust. For example, you can call a C function taking/returning size_t with usize on the Rust side.

You are only allowed to pass values of these guaranteed types through FFI. Doing a transmute won't extend such rights, and won't make your programs more correct.

Creating new Rust types for FFI is basically what Mozilla's ffi_support library does.

For example, the FfiStr:

#[no_mangle]
extern "C" fn valid_use_1(s: FfiStr<'_>) {
    // Use of `s` after this function returns is impossible
}

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.