What's the best practice to get string by FFI?

guoxbin · March 15, 2020, 1:31am

I referred http://llever.com/rust-ffi-omnibus/string_return/

I have 2 questions.

Q1: About memory deallocation

As rust-ffi-omnibus shows, when we provide a ffi function returning *mut c_char, we aways provide a ffi function taking the *mut c_char to deallocate the memory.

This is useful for the ffi caller who can not deallocate directly, like Ruby, Python.

However, when we call from rust, should we just use CStr to borrow *mut c_char, copy and then call the ffi to deallocate

or use CString to take the ownership of *mut c_char and rely on RUST to manage the life time of the CString variable?

Code1:

let name: String = unsafe {
    let raw : *mut c_char = get_name_ffi_call();
    let name = CString::from_raw(raw).into_string().expect("name should be utf-8");
    // the ownership of raw is moved to name
    // never use raw again
    // the deallocation of name (and raw) will be guranteed by rust
    name
};

or

Code2:

let name: String = unsafe {
	let raw : *mut c_char = get_name_ffi_call();
	let name = CStr::from_ptr(raw).to_str().expect("qed").to_owned();
	free_name_ffi_call(raw);
	name
};

Code2 involves some overhead for to_owned, but I'm not sure whether Code1 works fine.

Is this comment correct?

// the ownership of raw is moved to name
// never use raw again
// the deallocation of name (and raw) will be guranteed by rust

Q2: About overhead of CString

I wrote benchmark tests to determine the overhead of CString

Code1: show native call

#[bench]
fn bench_string(b: &mut Bencher) {
	let get_name = || {
		"".to_string()
	};

	b.iter(|| black_box(get_name()));
}

Code2: show ffi call (into_raw then from_raw)

#[bench]
fn bench_string_ffi(b: &mut Bencher) {
	let get_name =
		|| {
			let s = CString::new("test").unwrap().into_raw();
			unsafe { CString::from_raw(s).into_string().unwrap() }
		};

	b.iter(|| black_box(get_name()));
}

the benchmark tests result:

test bench_string            ... bench:          76 ns/iter (+/- 10)
test bench_string_ffi        ... bench:         105 ns/iter (+/- 15)

Is this overhead unavoidable in the case of getting string by ffi?

Yandros · March 15, 2020, 3:40am

So you seem to be both exporting FFI functions and calling them from within Rust itself, is that right?

If so, the better pattern is to have Rust functions as usual,

Example

fn get_fancy_string ()
  -> String
{
    String::from("Hello, World!");
}

and then have a wrapping function / a shim over the Rust function for FFI:

mod exported {
    use ::core::{ptr, os::raw::c_char};

    #[no_mangle] pub extern "C"
    fn get_fancy_string ()
      -> *mut c_char
    {
        CString::new(super::get_fancy_string())
            .map(CString::into_raw)
            .unwrap_or_else(|err| {
                eprintln!("Error, `get_fancy_string()` returned a string with inner null bytes: {}", err);
                ptr::null_mut()
            }
    }

    #[no_mangle] pub unsafe extern "C"
    fn free_rust_string (p: *mut c_char)
    {
        if p.is_null() { return; }
        drop::<CString>(CString::from_raw(p));
    }
}

so that internally you can still be calling get_fancy_string().

If this is cross-crate, it still works by having the crate exporting functions be not only a cdylib but also an rlib, so as to depend on it as you depend on any Rust crate.

Now, assuming the above suggestion is not applicable, then indeed you'd have to go through the C ABI you have defined.

If you control the implementation in a way where you know that the obtained *mut c_char originates from a call to CString::into_raw() and you know to be using the very same allocator as the initial module was, then calling CString::from_raw() could be fine.
- but this second condition is hard to ensure: even if you appear to be relying on the same implementation of the allocator, you could be working on different instances of such allocator, in which case it is no longer sound.

So, my rule of thumb would be:

Use CStr::from_ptr to borrow the resulting string, and then get you own owned version.

So that leans towards your "Code 2" suggestion, although you can optimize it a bit with your own newtype expressing the special free it has:

use ::std::{
    ffi:{CStr, CString},
    ptr,
    ops::Deref,
};

fn get_fancy_string ()
  -> Option<impl 'static + Deref<Target = str>>
{
    unsafe {
        let ptr = ptr::NonNull::new(get_name_ffi_call())?;
        let c_str = CStr::from_ptr(ptr.as_ptr());
        let str = if let Ok(it) = c_str.to_str() { it } else {
            free_name_ffi_call(ptr);
            return None;
        };
        return Some(BoxedStr( // all this is just type-level stuff, in practice it's a no-op
            ptr::NonNull::from(str)
        ));
    }
    // where
    struct BoxedStr /* = */ (
        ptr::NonNull<str>, // `&'a str` but without the annoying lifetime parameter
    );
    impl Deref for BoxedStr {
        type Target = str;

        #[inline]
        fn deref (self: &'_ Self) -> &'_ str
        {
            self.0.as_ref()
        }
    }
    impl Drop for BoxedStr {
        fn drop (self: &'_ mut Self)
        { unsafe {
            free_name_ffi_call(self.as_ptr())
        }}
    }
    // The following assumes the FFI crate is not doing crazy things 
    unsafe impl Send for BoxedStr where Box<str> : Send {}
    unsafe impl Sync for BoxedStr where Box<str> : Sync {}
}

The idea is that you construct your own Box<str>-like abstraction thanks to having access to the free function from FFI: instead of cloning + dropping the obtained thing, you just keep a handle on the obtained here but with this abstraction layer that ensures it gets freed with the special FFI function.

Aside: an alternative to an ownership-based API in FFI

The solution suggested by the omnibus is mainly the simplest and most intuitive one, but resorting to heap allocations just to be able to provide to the FFI a pointer is suboptimal. For some structures, a stack pointer would suffice. But given that the stack of the FFI function is cleaned up when returning to you, the only way to get that working is through callbacks:

/// Imagine it is an opaque object,
/// so the size / alignment is not guaranteed for FFI,
/// hence the need to always work with `*mut Foo` at the ABI-level.
pub
struct Foo
where
    Self : Sized, // We do know that it is not a DST, though
{
    x: i32,
}

#[no_mangle] pub unsafe extern "C"
fn foo_get_x (foo: *const Foo)
  -> i32
{
    let foo = &*foo;
    foo.x
}

/* == Classic pattern == */
#[no_mangle] pub extern "C"
fn new_foo ()
  -> Option<Box<Foo>> // (could also be using Arc instead of Box)
  // or
  // -> *mut Foo
{
    Some(Box::new(Foo::new()))
    // or
    // Box::into_raw(Box::new(Foo::new()))
}

#[no_mangle] pub extern "C"
fn free_foo (_: Option<Box<Foo>>)
{}
// or
#[no_mangle] pub unsafe extern "C"
fn free_foo (p: *mut Foo)
{
    if p.is_null() { return; }
    drop::<Box<Foo>>(Box::from_raw(p))
}
/* == End of classic pattern == */

// The above pattern requires heap-allocating, just to provide a pointer to a Foo through FFI
// But if we only want to provide a borrowed access to a Foo, we can manage to stack-allocate it:

#[no_mangle] pub unsafe extern "C"
fn with_foo (
    data: *mut c_void,
    cb: Option<unsafe extern "C" fn (data: *mut c_void, foo: *mut Foo)>,
)
{
    let cb = if let Some(cb) = cb { cb } else {
        return;
    };
    let ref mut foo = Foo::new();
    cb(data, foo);
}

which C code can then call as:

#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>

#include "rust_ffi.h"

void cb (void * data, foo_t * foo)
{
    int32_t * at_x = (int32_t *) data;
    *at_x = foo_get_x(foo);
}

int main (int argc, char const * const argv[])
{
    int32_t x = 0;
    with_foo((void *) &x, cb);
    printf("foo.x = %" PRId32 "\n", x);
    return 0;
}

The pattern gets very cumbersome because ANSI C has zero sugar for closures (some compiler extensions can greatly help in that regard), but at least:

there is no more worrying about how to free foo; it "automagically" happens at the end of cb
Rust gets to stack allocate it (e.g., imagine stack-allocating a small C string).

Douglas · March 15, 2020, 7:57am

One thing to think about is, which code is responsible of ensuring the same allocator is used to both allocate and free the string?

It is obvious in the Ruby/Python case that the simplest way to do that is to have the FFI interface expose both the allocate and free methods, so that there’s no need to think about how the Ruby/Python’s own allocator works.

In many C-to-C style FFI interfaces, the allocation strategy is the other way around. Instead, the FFI has get_length and populate functions. The caller first calls get_length, allocates the memory however it likes, then passes a pointer to the FFI interface to populate it. The caller, in this case Rust, can then allocate memory that can be dropped as usual, without having to worry about calling an FFI interface again later to free it.

The setup you’re suggesting, where the code on one side allocates and the code on the other side frees, has the benefit that only one FFI call needs to be made, but has the drawback that both sides must ensure they are using the same allocator. That could be quite tricky to ensure, and I’d be worried about it silently breaking by accident. It would mean calling the FFI from Ruby/Python would be quite hard, as they’d need access to the allocator too, so the FFI would be less useful. I’d be more tempted to implement all the code together without the FFI than do that, since the two sides would be quite intertwined.

Michael-F-Bryan · March 15, 2020, 1:17pm

This isn't actually as hard as it sounds, it's fairly probably that this is already guaranteed.

When compiled as a cdylib Rust will use the system allocator (libc's malloc and free), and because the two main implementations of Python (CPython) and Ruby (Matz's Ruby Interpreter) are both C programs which link to libc, they should also be using the system allocator. It's also possible to explicitly define the global allocator used by Rust.

I wouldn't worry about performance overheads. The worst that'll happen is you make one or two unnecessary copies, and unless you're working with strings that are megabytes in length the actual performance difference will blend into the noise. For perspective, cache misses are measured in the hundreds of ns, and IO is typically measured in ms... Instead, what you will notice is crashes due to use-after-free or double-free bugs because you tried to skip a copy.

Another thing to note is that benchmark is actually deceiving and not measuring what you expect. You need to use black_box() on the input string so the compiler can't "see" that the first example gets an empty string as input, because it'll probably elide the allocation and empty copy altogether. The second benchmark uses a different input string which, besides needing to be wrapped in black_box(), means you aren't comparing apples to apples. Also, because the compiler can see where s comes from, when it inlines CString::from_raw() and CString::into_string() it'll probably be smart enough to copy "test" directly to the final location.

I don't think it's sound to call CString::from_raw() on any old string pointer that gets passed to you... Instead if you need to accept a string create a copy and make it it the caller's responsibility to free the original string afterwards.

When you want to transfer a string (or Vec<u8>, or any other type that changes size at runtime) I've found it's easier for the callee to not allocate anything. You can often sidestep the whole ownership problem if the caller passes in a suitably sized buffer. One trick I've seen for telling the caller how much to allocate is to let them pass in a null pointer as the buffer and return the number of bytes that would have been copied.

/// Writes a string into a buffer provided by the caller, returning 
/// the number of bytes written.
unsafe extern "C" fn do_stuff(buffer: *mut c_char, buffer_len: c_int) -> c_int {
  let some_string = "Hello, World!";

  if buffer.is_null() {
    // the caller wants to know how much memory to allocate
    return some_string.len() as c_int;
  } else if buffer_len < some_string.len() {
    return ERR_INSUFFICIENT_BUFFER_SIZE;
  }

  // otherwise, do the actual copy
  let mut buffer = std::slice::from_raw_parts_mut(buffer as *mut u8, buffer_len);
  buffer.copy_from_slice(some_string.as_bytes());

  some_string.len() as c_int
}

Of course, that's not always possible and you might need to return a *const c_char. In those cases I'd probably use CString::into_raw() and add a comment above the function saying you need to call some free_rust_string() function instead of libc's free.

Otherwise you could use strncpy() or snprintf() from libc to create a copy of your Rust String which is null-terminated and guaranteed to be free-able.

Sorry for the wall of words, like most problems the best answer is "it depends"

system · June 13, 2020, 1:17pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Allocating a C string in Rust and passing to C help	8	3077	January 30, 2021
Correct way to implement a function which returns a C string	9	8383	January 12, 2023
Providing pointer to CString	8	625	September 8, 2022
Taking Ownership from a C String Retrieved from FFI help	7	514	April 11, 2024
Why the value from CString::into_raw() can't be freed?	4	837	September 22, 2019

What's the best practice to get string by FFI?

Q1: About memory deallocation

Q2: About overhead of CString

Related topics