Struggles creating a C wrapper for rust lib

I'm rather inexperienced with rust and I am contemplating writing a C wrapper for a rust library.

Thus far I have found experiments for this very difficult, essentially because in order to write the wrapper I have to violate all of the safety mechanisms that rust is built around.

The main problem I have is around returning object pointers that can be wrapped on the other side. I think the main way to do this is supposed to be Box::leak but in some cases this involves some pretty awkward extra copying even when my object comes from a Box in the first place.

To make things a bit more concrete, I am trying to use arrow2 to expose a struct via a pointer. This package provides the struct ArrowArray which is supposed to be binary compatible with the C struct defined here.

First, let me show the case that I understand:

#[no_mangle]
pub unsafe extern "C" fn array_pointer_demo() -> *mut c_void {
    let a: Vec<i64> = vec![1, 2, 3, 4];
    let abox = a.into_boxed_slice();
    Box::leak(abox) as *mut _ as *mut _
}

Here I get back a pointer to the Vec. I frankly am unsure about whether there is any copying going on here (I don't think so since Vec should already be heap allocated), but at least I'm getting back a pointer to something that the stdlib docs assures me is not going to get freed. Indeed, this example and obvious variations of it seem to work fine.

I'm trying to use this in a very simple example in which I (think) I copy an array onto the heap (a.to_boxed()) and then try to return its pointer.

#[no_mangle]
pub unsafe extern "C" fn arrow_direct() -> *const c_void {
    let a = PrimitiveArray::<i64>::from_vec(vec![1, 2, 3]);
    let a = &a as &dyn arrow2::array::Array;
    //rust doesn't seem to be preserving this!
    let abox = a.to_boxed();
    &export_array_to_c(abox) as *const _ as *const _
}

This compiles, however it looks like the memory is getting freed. I think this makes sense to me, but I'm not sure what the alternative is.

What I'd like to do is somehow coerce the lifetime of the reference to 'static but rust really does not like when I try to do this. For example

    let _o = export_array_to_c(abox);
    let o: &'static _ = &_o;

doesn't compile and the ructc --explain strongly hints that coercion of lifetimes is not a thing. I don't understand why I can declare the output of Box::leak static but I can't seem to create a 'static reference on my own.

The only thing I can thing of which might work (though I haven't verified it to my satisfaction) is to instead create another box with Box::new(export_array_to_c(abox)) and then do Box::leak on this, but this requires an extra copy.

Btw, I also do not understand why I need 2 as clauses to convert to something compatible with a c_void pointer. Box::leak is already returning a reference to the underlying object and I don't understand why this can't be cast as a pointer or what the double as means more generally.

Any advice appreciated, thanks!

I think you can avoid an intermediary copy here with PrimitiveArray::boxed


You can't stretch a lifetime to longer than the reference lives, that would defeat the whole point of lifetimes.

Box::leak can create a static reference because once youve leaked the box there's no way to (safely) deallocate that memory.


You need the double casts because you're taking a reference, casting it to a raw pointer, and then casting the raw pointer to a new type. You can get rid of one by using Box::into_raw instead. That associated function returns a raw pointer, and explicitly supports being deallocated by using Box::from_raw to recover the box, and then dropping it (either implicitly or explicitly).

#[no_mangle]
pub unsafe extern "C" fn array_pointer_demo() -> *mut c_void {
    let a: Vec<i64> = vec![1, 2, 3, 4];
    let abox = a.into_boxed_slice();
    Box::into_raw(abox) as *mut c_void
}

This can't work. You're taking a reference to a temporary (the value returned by export_array_to_c) and casting it to a raw pointer. But as soon as the function returns that memory will no longer be valid. If you absolutely need to return a pointer you will need to do something like boxing the ArrowArray to get a stable address for it.

Why are you trying to return *const c_void instead of arrow2::ffi::ArrowArray? That would get rid of your problem with raw pointers entirely.

1 Like

Shear stupidity, apparently.

I got mentally stuck because I spent most of the evening trying to pass heap-allocated arrays, and then I got very stuck on the concept that the underlying data I wanted was again heap-allocated and boxed, it wasn't occurring to me that ArowArray would be the exact structure I needed. Pretty stupid I know, I should have gotten up and walked away from it for a couple of hours and I probably would have realized what I was doing.

Ok that makese sense, thanks, I was assuming that casts obeyed some kind of transitivity.

This is the one thing I still don't completely understand. Is this just a special property of Box? I don't know any other way to create a static lifetime object without declaring a global scope. Is Box::new followed by Box::leak the "canonical" way to create heap allocated data that can be handed off in C? That static lifetime also makes me a bit nervous about whether it's really allowed to free it in the wrapper.

Nope! Box is not magic[1]. You can construct a static reference any time you want with some unsafe

Playground

fn main() {
    let liar: &'static u8 = unsafe {
        let value = [1u8, 2, 3];
        &*(&value[0] as *const u8)
    };

    println!("{liar}")
}

Obviously that code is wrong, and liar is not valid for static (currently it appears to print the right value in debug mode but in release mode it spits out some random value for me). miri also correctly complains that I've done something horrible there.

Box::leak doesn't even actually construct an &'static mut T, it has a lifetime parameter that decides how long the reference should last. If you're assigning the return value to a &'static then it will be 'static but it doesn't have to be.


  1. mostly ↩ī¸Ž

1 Like

Ah, ok, I think I got it, so unsafe is the thing which is doing the work of allowing that to be cast as static not Box. Though I suppose there still has to be a malloc somewhere within Box::new and I still don't understand what that would look like in rust (other than using Box or another type I already know allocates like Vec or String).

Alright, so I think I got it, so, in summary, if you have something with static lifetime on the heap you can get the pointer with Box::leak and presumably if it makes sense to do this you probably already have the box, otherwise you should just return the object directly. I guess that covers everything, if I have a Vec or String or something like that I can just convert it to a box like in my example above, everything else is pretty straightforward.

I seem to have been making things way more complicated than they really were.

If you're curious about allocation, the std::alloc module would be a good place to start. In practice though it's pretty rare to need to reach for it, using other std types that handle it for you is often easier.

It's easy to do, Box in particular can seem much more magic than it really is.

1 Like

This thread has been extremely helpful so far, thanks!

I think I'm starting to understand things much more clearly and I'm happy because this whole process of creating a C wrapper for rust code doesn't seem nearly as difficult as I was beginning to fear, but there are still a few points that are unclear to me.

This mostly has to do with how to tell rust to not free stuff on the heap. Like I said, it seems the easiest way to do this is to convert to a box and use Box::leak. The source code of Box::leak makes me think that maybe it's disabling whatever mechanism usually frees the memory when the object goes out of scope, but I still don't feel entirely confident in this since it doesn't seem all that explicit.

Returning to my canonical working example

#[no_mangle]
pub unsafe extern "C" fn array_pointer_demo() -> *mut c_void {
    let a: Vec<i64> = vec![1, 2, 3, 4];
    let abox = a.into_boxed_slice();
    Box::leak(abox) as *mut _ as *mut _
}

I'm still very confused about what, if anything, guarantees that this won't be freed. This example has worked in every example I've attempted, however the following does not seem to work

#[no_mangle]
pub unsafe extern "C" fn array_pointer_demo() -> *mut c_void {
    let a: Vec<i64> = vec![1, 2, 3, 4];
    let abox = Box::new(a);
    Box::leak(abox) as *mut _ as *mut _
}

I don't think I understand why these examples would be different.

Is there a way to explicitly tell the compiler "please don't free this"? (Btw this is what I mean when I had previously talked about "coercing" the lifetime to static which I now realize is a much more confusing way of phrasing it.)

The guarantees about leaking are in the documentation. That is the idiomatic way to leak things. There are other ways to not run a destructor, like ManuallyDrop or forget.


Turning a Vec<T> into a boxed slice gives you a Box<[T]>. This consists of a pointer to data on the heap and a length. It cannot dynamically grow or shrink so there is no capacity. When you leak this, you get a pointer to the data -- to the [T]. It's a wide pointer consisting of a pointer to the data and the length, just like the Box<[T]> did. When you convert it to *mut c_void it turns into a normal/thin pointer to just the data.

Box::leak(abox) // Box<[T]> -> &mut [T]
   as *mut _    // -> *mut [T]    (ptr to data, length)
   as *mut _    // -> *mut c_void (ptr to data)

A Vec<T> consists of a pointer to data on the heap, a usize for the current capacity, and a usize for the current length. So three usizes of data, regardless of the capacity. The Vec dynamically manages the allocation to hold the Ts.

A Box<Vec<A>> puts those three usizes on the heap. You have an extra layer of indirection compared to Box<[T]>. When you leak them, the Vec still won't drop. But the return from leak is a reference to the Vec<T>, not to the T themselves. You're treating the three usizes like they were your data. You still have an unwanted indirection.

Box::leak(abox) // Box<Vec<T>> -> &mut Vec<T>
    as *mut _   // *mut Vec<T> (ptr to (ptr to data, length, capacity))
    as *mut _   // *mut c_void (ptr to (ptr to data, length, capacity))

It's a bit out of date (e.g. I don't think Mutex allocates anymore; anyway that's an implementation detail), but this diagram may aid in understanding. The exact layout of the types (e.g. order of the pointer, length, and capacity) are also implementation details (so you can't count on the order in your C code, say).

I think I understand that but I'm still really struggling with handling the heap-allocated data here. I can't seem to get an object handle on the C-side that I can get back as something useful in rust.

I've tried several variations on

#[no_mangle]
pub unsafe extern "C" fn putout() -> *mut Vec<i64> {
    let a: Vec<i64> = vec![1, 2, 3];
    let mut o = mem::ManuallyDrop::new(a);
    &mut *o
}

#[no_mangle]
pub unsafe extern "C" fn getback(aref: *mut Vec<i64>) {
    println!("{:?}", *aref);
}

then on the C side I get the pointer with putout and try to put it back in rust with getback. I get something that looks bogus in C from putout, but I figure that's ok because nothing is guaranteeing me that the pointer is going to point to [1,2,3]. However I never get something that works when I supply it back to getback.

By now I've read quite a lot of documentation on the subject and nothing seems to describe this issue explicitly, which has me a bit puzzled because it seems like such an obvious use case to me, so I might still be missing something major.

I found this stackoverflow post and this example works for me:

#[no_mangle]
pub unsafe extern "C" fn putout() -> *mut Vec<i64> {
    let a: Vec<i64> = vec![1, 2, 3];
    let abox = Box::new(a);
    Box::into_raw(abox)
}

#[no_mangle]
pub unsafe extern "C" fn getback(aptr: *mut Vec<i64>) {
    let abox = Box::from_raw(aptr);
    println!("{:?}", *abox);
}

I think I have it figured out now, however Box still seems pretty mysterious to me. Absent any new information, I would have thought that the above would not be guaranteed to elide the destructor, however the documentation for into_raw indicates that this will not happen.

I'm also less than 100% confident that Box::new in the above example did not just copy a which is definitely not what I want.

That's returning a pointer to the Vec structure on the stack, not the data the Vec contains (which is on the heap). The stack data will definitely be garbage after the function returns

This works because you allocate a place for the Vec structure on the heap and then return a pointer to that. You are copying the 3 pointer wide Vec structure to the box's allocated memory location, but you aren't making a copy of the entire Vec. If you want to avoid allocating you need to create essentially an FFI safe copy of Vec's data that it keeps directly on the stack in your example.

Playground

fn main() {
    unsafe {
        let ffi = putout();

        getback(ffi);
    }
}

#[repr(C)]
struct FfiVec {
    ptr: *mut i64,
    len: usize,
    capacity: usize,
}

impl FfiVec {
    fn from_vec(mut vec: Vec<i64>) -> Self {
        let ffi = FfiVec {
            ptr: vec.as_mut_ptr(),
            len: vec.len(),
            capacity: vec.capacity(),
        };

        std::mem::forget(vec);

        ffi
    }

    fn into_vec(self) -> Vec<i64> {
        unsafe { Vec::from_raw_parts(self.ptr, self.len, self.capacity) }
    }
}

#[no_mangle]
unsafe extern "C" fn putout() -> FfiVec {
    let a: Vec<i64> = vec![1, 2, 3];
    FfiVec::from_vec(a)
}

#[no_mangle]
unsafe extern "C" fn getback(ffi: FfiVec) {
    let vec = ffi.into_vec();
    println!("{:?}", vec);
}

The std::mem::forget(vec) skips running the destructor for Vec which is important since we want the heap data to still exist when we reassemble the Vec in getback

1 Like

Thanks, this seems helpful but I'll have to mess around with it a bit to really absorb it I think.

I thought that Vec was always on the heap, I'll have to go read the docs for it.

A Vec's contents are on the heap. The information about how large the allocation is, and how many elements are in the Vec are stored directly inside the Vec structure. Next to the pointer to the heap allocation.

1 Like

Ok, I think I've figured it out, the Box::into_raw, Box::from_raw pattern seems reliable. If I don't want it to copy I'm probably going to have to ensure the entire object is just heap allocated in the first place with Box::new, otherwise I think the pointer copying is just going to be an acceptable and necessary overhead.

This is probably a dumb question, but just to be extra sure, if I do Box::new on a Vec, it's only copying the pointer and offset data from the stack right? How about Box::from_raw (in that case the docs seem to indicate that it is not copying the heap contents)? It's not also copying the heap contents? Because that's the thing I really need to avoid.

Box::new doesn't do anything to the contents of the value you pass, it just copies the value passed to the new location on the heap. So it will only be copying the 3 integer values from the stack to the new heap location, yes

Sorry, I asked the wrong question. Forget Vec, suppose I have some other object that consists of a bunch of pointers to heap data. Will Box::new or Box::from_raw copy just the pointers, or will it make a deep copy of everything they refer to?

You can tell new doesn't do anything close to a deep copy because the method doesn't have any trait bounds on T, and neither does the impl block it's inside. So new basically can't do anything with the value other than move it.

Rust doesn't really have a deep copy trait in the first place. Clone makes a new copy of the value without invalidating the old one, but exactly what that means is deliberately not part of the trait contract