[FFI] How to copy a Vec<u8> into a *mut u8?

Hi,

I'm exposing a Rust crate to C and I'm trying to figure out how to copy a Vec<u8> into a *mut u8 given by the C world as a parameter like this:

pub extern "C" fn c_create_data(data: *mut u8) -> c_int {
   let v: Vec<u8> = some_data_gen_fn();
  // how to put the content of the Vec so it's available to C?
 v.len()
}

Thanks :slight_smile:

How do you know the capacity of data that's safe to write to?

Sorry, my function def is not complete in my example:

pub extern "C" fn c_create_data(data: *mut u8, data_len: c_uint) -> c_int {
   let v: Vec<u8> = some_data_gen_fn();
  if v.len() > data_len {
   return -1;
  }
  // how to put the content of the Vec so it's available to C?
 v.len()
}

I'm checking that the declared buffer size is enough of course :slight_smile:

Hmm, one possibility is to use std::ptr::slice_from_raw_parts_mut to get a slice from the pointer. Then use copy_from_slice to copy the Vec.

For example (I haven't tested this):

let mut array = std::ptr::slice_from_raw_parts_mut(data, v.len());
array.copy_from_slice(&v);
2 Likes

Only if data doesn’t point to uninitialized memory.

copy_from_slice doesn't exists in *mut [u8] :confused:

Of course, it the responsibility of the caller to provide a correct buffer :slight_smile:

Here's some thing that uses ptr::copy instead which I think should handle an uninitialized buffer:

std::ptr::copy(v.as_ptr(), data, v.len());
4 Likes

Just stumble upon it as well :smiley:

@chrisd was faster, but I was going to suggest

unsafe {
    std::ptr::copy_nonoverlapping(v.as_ptr(), data, v.len());
}

as well after finding nothing better. Note that the regions can never be overlapping in this case.

6 Likes

I've seen this one as well, how can I be sure that the regions don't overlap?

Oh, easy. A Vec owns its memory, and we’re owning the Vec here which means there can be absolutely nothing else pointing to accessing the same region as v.as_ptr() with length v.len() (unless something has gone horribly wrong in a different place already, but that’s why such things gone horribly wrong are "undefined behavior").

The biggest problem with translating this is converting the integers without panicking (panicking across FFI boundaries is UB).

Here's what I came up with so far:

pub unsafe extern "C" fn c_create_data(data: *mut u8, data_len: c_uint) -> c_int {
    let v: Vec<u8> = some_data_gen_fn();
    if usize::try_from(data_len)
        .map(|len| v.len() > len)
        .unwrap_or(false) // if data_len can't be converted to usize, that's fine, we just won't use the whole thing
    {
        return -1;
    }
    std::ptr::copy_nonoverlapping(v.as_ptr(), data, v.len());
    c_int::try_from(v.len()).unwrap_or(-1) // note: I don't think this can ever actually return -1
}

I believe that slice::from_raw_parts_mut is technically OK here due to Rust and C not really sharing a notion of "initializedness", but ptr::copy_nonoverlapping is shorter, anyway.

The extra complexity is mostly because of the width mismatch between c_uint and usize. It would be cleaner if the function can be rewritten to take size_t instead.

Note also that the whole function needs to be marked unsafe, because it's possible to create raw pointers in safe code.

5 Likes

What if v.len() == INT_MAX + 1?

2 Likes

Ah, yep, that would do it.

No idea what the behavior should be in that case.

This problem also goes away if you change c_create_data to take usize and return isize (aka size_t and off_t, respectively), because a Vec never allocates more than isize::MAX bytes.

(Although Vec guarantees that by panicking in cases, which could be bad if it happens in some_data_gen_fn.)

2 Likes

Thanks! I can change the signature, I'm the one defining it so I'll move to size_t and off_t I didn't know the later until today!

I can't find size_t in std::os::raw :-/

You have them available on the ::libc crate: ::libc::size_t.

That being said, since Rust "only" supports platforms where uintptr_t == size_t, you can use usize as size_t. off_t is more subtle, on the other hand, so should you need it, use the one from ::libc.

2 Likes

I don't understand this part, why the whole function needs to be unsafe?

For the moment, I have some unsafe block in it (I have more parameters than showed in my example, and I convert the pointers to CStr and &[u8])

It has to be unsafe because otherwise you could call it with a null pointer in safe code and trigger undefined behavior.

4 Likes