Zero-copy C function wrapper

I am writing a Rust wrapper for a C library with a function which involves transferring buffers to a hardware device. The buffers are potentially very large (multiple gigabytes) so the library uses a potentially zero-copy method for transferring data from host to device. I am having difficulty determining the best interface for exposing this function in a memory safe manner in Rust. Any suggestions or examples from existing libraries would be greatly appreciated.

The interface looks roughly like this (simplified for clarity):

unsafe extern "C" fn create_device_buffer(
	device: *mut Device,
	data: *const c_void,
	len: usize,
	other_args: OtherArgs,
	buffer: *mut Buffer,       // out
	transfer_done: *mut Event, // out, indictates when data is safe to be freed
) -> ErrorCode;

unsafe extern "C" fn event_is_ready(event: *mut Event) -> bool;
unsafe extern "C" fn event_block_until_ready(event: *mut Event);
unsafe extern "C" fn event_get_error(event: *mut Event) -> ErrorCode; // only valid after event is ready
unsafe extern "C" fn event_destroy(buffer: *mut Event);

pub type OnEventReadyCallback = unsafe extern "C" fn(error: ErrorCode, user_arg: *mut c_void);
unsafe extern "C" fn register_event_on_ready_callback(event: *mut Event, callback: OnEventReadyCallback, user_arg: *mut c_void);

unsafe extern "C" fn buffer_destroy(buffer: *mut Buffer) -> ErrorCode;

The data must be un-mutated from the create_device_buffer call until when event_is_ready(transfer_done) returns true to ensure memory safety. Depending on the OtherArgs, size of the data, and usage of the Buffer, the event lifetime can range from returning immediately after the call all the way to living for the entire lifetime of the Buffer until buffer_destroy is called.

I have some incomplete code sketched out with few ideas that I had.

The first idea is to try to express that the host buffer outlives the transfer in lifetimes by returning a guard sort of like a mutex. The main issue that I see with this design is that there needs to be enforcement that the guard can't be dropped before wait is called.

fn create_device_buffer_wrapper<'host>(dev: Device, buf: &'host [u8], other_args: OtherArgs) -> Result<(Buffer, BufferGuard<'host>), Error>;

struct BufferGuard<'host> {
	transfer_done: *mut Event,
	buf: &'host [u8],
	// ...
}

impl <'host> BufferGuard<'host> {
	pub fn wait(&self) -> Result<(), Error> {
		event_block_until_ready(self.transfer_done);
		wrap_error(event_get_error(self.transfer_done))
	}
	pub async fn wait_async(&self) -> Result<(), Error> {
		// TODO: register_event_on_ready_callback with waker
		// TODO: poll the event with event_is_ready until it's ready
		wrap_error(event_get_error(self.transfer_done))
	}
}

impl <'host> Drop for BufferGuard<'host> {
	fn drop(&self) {
		// need to ensure the guard wasn't dropped before the transfer has completed
		if !event_is_ready(self.transfer_done) {
			panic!("guard dropped before ready");
		}
		// QUESTION: maybe it's better to block instead? this could create difficult to debug async bugs
		event_destroy(self.transfer_done);
	}
}

Another idea is to have the guard take ownership of the buffer. The goal would be to avoid the need to enforce that the guard can't be dropped prematurely, but may be more awkward to use and could force the caller to copy if they need to read the buffer concurrently.

fn create_device_buffer_wrapper(dev: Device, buf: Vec<u8>, other_args: OtherArgs) -> Result<(Buffer, BufferGuard), Error>;

struct BufferGuard {
	transfer_done: *mut Event,
	buf: Vec<u8>,
	// ...
}

impl BufferGuard {
	// consume the guard and return ownership of vector
	// QUESTION: does drop still run if the guard is consumed?
	pub fn wait(self) -> Result<Vec<u8>, Error> {
		event_block_until_ready(self.transfer_done);
		wrap_error(event_get_error(self.transfer_done))?
		Ok(self.buf)
	}

	pub async fn wait_async(self) -> Result<Vec<u8>, Error> {
		// TODO: register_event_on_ready_callback with waker
		// TODO: poll the event with event_is_ready until it's ready
		wrap_error(event_get_error(self.transfer_done))?
		Ok(self.buf)
	}
}

impl Drop for BufferGuard {
	fn drop(&self) {
		// handle the case that wait is never called
		// TODO: need to forget the buffer and register a callback to free it
	}
}

The last idea is to require the caller to wrap the buffer in a mutex. This design avoids the need for the guard, but I am not clear on the implications on supporting async. Also, there would need to be a different mechanism to pass a transfer error back to the caller.

fn create_device_buffer_wrapper<(dev: Device, buf: Arc<Mutex<&[u8]>>, other_args: OtherArgs) -> Result<(), Error> {
	let guard = buf.lock().unwrap();
	create_device_buffer(
		// ...
	);
	// QUESTION: how can I pass the guard to the callback, MutexGuard implements !Send but I am not clear why
	register_event_on_ready_callback(transfer_done, callback, &mut guard as *c_void);
}

unsafe extern "C" fn callback(error: ErrorCode, user_arg: *mut c_void) {
	let guard = *(user_arg as *mut MutexGuard);
	guard.unlock();
	// TODO: do something with the error
}

I haven't checked the entire implications of this, but how about passing the data in as Arc<Vec<u8>>? This way,

  • the wrapper code can make a clone of the Arc that it owns, and thereby ensure the Vec lives long enough on its own authority
  • the calling code can still read the Vec any time it wants
  • the calling code can mutate the Vec when it is no longer in use via Arc::get_mut()

(The reason I say Arc<Vec<T>> rather than Arc<[T]> is to ensure it is a zero-copy operation to create the Arc<Vec<T>> from a Vec created from who knows where.)

1 Like

Is the caller able to know when the Vec is no longer is use? I was thinking that Arc<Mutex<&[u8]>>> would be needed for the caller to wait until the callback had unlocked. The only way I can think how to do without the Mutex is polling Arc::get_mut(), which doesn't seem ideal.

This is problematic if you want to borrow a slice. Your check in Drop::drop is not gonna guarantee this for two reasons:

  • it's a check followed by a panic, not an actual blocking wait, and panics execute code (other destructors, which can touch the buffer) and can also be catched.

  • even if it blocked, there's no guarantee that drop will be called, since leaking is safe.

Your example where the buffer is held in the struct looks sound though, assuming you implement the TODO in Drop. Your problem looks awfully similar to the io-uring problem (see for example Notes on io-uring), and AFAIK the solution there was also to keep an owned buffer.

Yes, you have to "leak" it in order to not do that. It's usually suggested to use ManuallyDrop for that.

This suffers from the same problem as the borrowed one, since you're holding a &[u8] in the Arc (also, it doesn't really make sense to do that).

Locks can only be released on the thread that acquired them, it's a limitation of some OS APIs. If it was Send then this restriction would be violated.

1 Like

To reply to the same question with further explanation: this is the only rational way to implement the transferring of ownership. If dropping didn't occur just because you passed a value to another function, then we would be leaking memory all over the place. The very point of ownership is this. Whoever owns a value will be responsible for cleaning it up; if you pass something by value to a function, now that function is responsible for dropping it (or passing it along).

2 Likes

I figured that the transfer_done event would also be exposed to the caller, and would suffice for that.

1 Like

I wasn't aware of that dropping is safe but I guess it makes sense that memory leaks aren't actually a memory safety issue. I see why this prevents borrowing a slice.

This is a really helpful post, thanks for linking it. It makes a compelling argument that taking ownership of the buffer is the way to go.

Yeah, in hindsight it doesn't make a lot of sense. If the signature was changed to take a Arc<Mutex<Vec<u8>>> instead of &[u8] it seems that it could work? Unfortunately because I can't guarantee the callback will run in the same thread as the caller and MutexGuard is !Send then I would need do something like acquiring the Mutex in a thread and using another synchronization primitive (which is Send) to signal to it when the callback is run. An alternative would be Arc<Vec<u8>> along with a method to wait as @kpreid suggested.

I am leaning now towards implementing it using the owned buffer approach.

This makes sense, thanks for the explanation.

Ah, I understand what you mean now.

Thanks for all the help!

1 Like