Deallocating Box::from_raw(*mut u8)

Hi everyone,

I open this topic because I struggle to understand a solution proposed in this old topic that has been closed.

The proposed solution is:

#[repr(C)]
struct Buffer {
    data: *mut u8,
    len: usize,
}

extern "C" fn generate_data() -> Buffer {
    // alloc 512 bytes in the heap
    let mut buf = vec![0; 512].into_boxed_slice();
    // get a pointer to the first byte 
    let data = buf.as_mut_ptr();
    let len = buf.len();
    // tell rustc to not deallocate the above allocated bytes
    std::mem::forget(buf);
    Buffer { data, len }
}

extern "C" fn free_buf(buf: Buffer) {
    // create a slice to the above allocated bytes
    let s = unsafe { std::slice::from_raw_parts_mut(buf.data, buf.len) };
    // get a pointer to the first byte 
    let s = s.as_mut_ptr();
    // create a box that point to the first byte and drop it
    unsafe {
        let _: Box<u8> = Box::from_raw(s);
    }
}

I enhanced the proposed solution with some comments containing what I think the code is doing.

Now my questions:

  • Shouldn't the last unsafe block only drop the first byte?
  • Can you explain how rustc is going to free the memory?
  • Is the below function equivalent to free_buf?
extern "C" fn free_buf(buf: Buffer) {
    unsafe {
        let _: Box<u8> = Box::from_raw(buf.data);
    }
}

Ty

If you create a Box<u8> that is indeed wrong. You need to create an Box<[u8]>. You can do that with std::ptr::slice_from_raw_parts_mut, which returns an *mut [u8].

3 Likes

This seems quite extraordinarily complicated. I'd do the following:

fn generate_data() -> Buffer {
    let mut buf = vec![0_u8; 512];

    // either:
    let data = buf.as_mut_ptr();
    let len = buf.len();
    std::mem::forget(buf);

    // or if you can use nightly, forget `forget`:
    let (data, len, _) = buf.into_raw_parts();

    Buffer { data, len }
}

extern "C" fn free_buf(buf: Buffer) {
    // vec![] guarantees length == capacity
    let _: Vec<u8> = unsafe {
        Vec::from_raw_parts(buf.data, buf.len, buf.len)
    };
}
2 Likes

I'm a bit paranoid when it comes to UB. I'd write the stable version as follows:

fn generate_data() -> Buffer {
    // I prefer this over `forget`, because it's less error prone.
    // Fewer necessary safety checks = happier developer
    let mut buf = std::mem::ManuallyDrop::new(vec![0_u8; 512]);

    // Get the length, first, then the pointer (doing it the other way around **currently** doesn't cause UB, but it may be unsound due to unclear (to me, at least) guarantees of the std lib)
    let len = buf.len();
    let data = buf.as_mut_ptr();

    Buffer { data, len }
}

extern "C" fn free_buf(buf: Buffer) { /* unchanged */ }
2 Likes

Sorry, I don't really see how the opposite order could go wrong. What did you have in mind? I.e. what guarantees doesn't the stdlib make that could result in UB by getting the pointer first and the length last?

.as_mut_ptr() returns *mut T from the &mut Vec<T>. But .len() requires &Vec<T> which may invalidates previously taken &mut Vec<T> and indirections derived from it.

4 Likes

Oh, you are right, I missed that.

Hi, I still not understand how "* mut T" invalid "&T" :expressionless: can you give a example?

Does this behavior rely on the implementation of Vec? if it is not documented, the assumption can be fail in the future

It is documented:

If len == capacity , (as is the case for the vec! macro)

1 Like

Well it's hard to give just an example since it's not much about the runtime operation. It's about (not formally specified but proposed) the Rust memory model. It means how the theoretical abstract Rust machine defines what the pointers and references means. If you want to dig more, it would be helpful to read this blog post and all the posts of the blog.

2 Likes

Heh, I find it to be a very apt coincidence here, that the detailed unsafe solution of that post was wrong (freeing a Box<u8>): using unsafe is dangerous and error-prone, typos such as Box<u8> instead of Box<[u8]> suffice to cause UB without necessarily triggering compiler errors!

I must insist that when non-unsafe / safe and sound abstractions are available, such as the one showcased in that very thread at the end, those should be preferred to rolling your own unsafe.

Indeed, another example is what @Phlopsi mentioned: even if you get the Box<[u8]> part right, you'd still have the potential issue, w.r.t. aliasing, of using mem::forget instead of a preemptive ManuallyDrop.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.