(small code snippet) Did I succeed in creating a safe API for my heap based array?

use std::alloc::Layout;

pub struct HeapArray<T: Sized> {
    pub size: usize,
    layout: Layout,
    location: *mut [T]
}

impl<T: Sized + Default> HeapArray<T> {
    pub fn new(size: usize) -> HeapArray<T> {
        let struct_size = std::mem::size_of::<T>();
        let size_to_alloc = struct_size * size;
        let layout = std::alloc::Layout::from_size_align(size_to_alloc, 1).unwrap();
        let location =
            unsafe {
                let location = std::alloc::alloc(layout) as *mut T;
                let array = std::slice::from_raw_parts_mut::<T>(location, size) as *mut [T];
                let array_iter = &mut *array;
                for i in 0..size {
                    array_iter[i] = T::default();
                }
                array
            };
        HeapArray {
            size,
            location,
            layout
        }
    }
}

impl<T: Sized> Drop for HeapArray<T> {
    fn drop(&mut self) {
        unsafe {
            std::alloc::dealloc(self.location as *mut u8, self.layout)
        }
    }
}

Edit : as you can see, it's still work in progress

I’m not familiar enough with unsafe code to do a proper review, but I do have a couple of concerns:

  • You’re casting a pointer to uninitialized memory to a reference, which I believe is UB.
  • You don’t call the destructor of any held items, so you might have a resource leak. This isn’t strictly unsafe, but probably not what you want.

But I initialize the unitialized memory right away with ::default() calls, doesn't that fix the problem ? Or am I missing something ?

Also, thanks for telling me about the destructors of held items

This is where I’ll need to defer to others. I think you might need to initialize the memory with pointer copies first or cast to MaybeUninit instead of the final type.

Also, how do I call the destructor of those objects ? drop(obj) doesn't look right

I think it’s drop_in_place, which you can guard with a needs_drop check to speed things up when the destructor won’t be doing anything.

1 Like

Does this look better ?

impl<T: Sized> Drop for HeapArray<T> {
    fn drop(&mut self) {
        unsafe {
            let array_iter = &mut *self.location;
            if needs_drop::<T>() {
                for i in 0..self.size {
                    drop_in_place(&mut array_iter[i] as *mut T);
                }
            }
            std::alloc::dealloc(self.location as *mut u8, self.layout)
        }
    }
}
1 Like

The funny thing is that this whole thing is only needed (for now) so I can store faces for polyhedra in my math library without using any vecs

You’ve essentially re-written Vec but without the capacity field. That’s a savings of 64 bits on a multi-kilobyte datastructure (if you remove the Layout storage).

1 Like

Here, I've rewritten it to be without layout storage

use std::alloc::Layout;
use std::intrinsics::needs_drop;
use std::ptr::drop_in_place;

pub struct HeapArray<T: Sized> {
    pub size: usize,
    location: *mut [T]
}

impl<T: Sized + Default> HeapArray<T> {
    pub fn new(size: usize) -> HeapArray<T> {
        let struct_size = std::mem::size_of::<T>();
        let size_to_alloc = struct_size * size;
        let layout = std::alloc::Layout::from_size_align(size_to_alloc, 1).unwrap();
        let location =
            unsafe {
                let location = std::alloc::alloc(layout) as *mut T;
                let array = std::slice::from_raw_parts_mut::<T>(location, size) as *mut [T];
                let array_iter = &mut *array;
                for i in 0..size {
                    array_iter[i] = T::default();
                }
                array
            };
        HeapArray {
            size,
            location
        }
    }
}

impl<T: Sized> Drop for HeapArray<T> {
    fn drop(&mut self) {
        unsafe {
            let array_iter = &mut *self.location;
            if needs_drop::<T>() {
                for i in 0..self.size {
                    drop_in_place(&mut array_iter[i] as *mut T);
                }
            }
            let struct_size = std::mem::size_of::<T>();
            let size_to_dealloc = struct_size * self.size;
            let layout = Layout::from_size_align(size_to_dealloc, 1).unwrap();
            std::alloc::dealloc(self.location as *mut u8, layout)
        }
    }
}

You probably want to correctly align your allocated memory:

let layout = std::alloc::Layout::from_size_align(size_to_alloc, std::mem::align_of::<T>()).unwrap();

Also you cannot convert your raw pointer to a slice and then assign to it. Rust assume that it is initialized memory and will drop the value T. This is UB, especially if T is a Box or another type with a drop implementation or dynamically allocated resources. You need to use MaybeUninitor std::ptr::write

for i in 0..size {
    array_iter[i] = T::default();  // value of array_iter[i] dropped, a Box will dealloc random memory
}

You might want to read https://doc.rust-lang.org/core/mem/union.MaybeUninit.html as they have an example specifically for your use-case

3 Likes
  • It's suspicious to call std::slice::from_raw_parts_mut on a pointer to uninitalized data. You should really be using MaybeUninit<[T]> for stuff like this.
  • You can't assume that location is correctly aligned for T because you created it with an alignment of 1. You need to pass std::mem::align_of::<T>() to Layout::from_size_align (or better yet, use Layout::array which takes care of alignment and multiplying by size).
  • Since *mut [T] is a fat pointer that stores the size, you end up storing the size twice.
  • It's unsound for size to be marked pub because that means it could be modified by anyone with a &mut HeapArray<T>, and size has to be correct for drop to be safe.
  • Since the layout can be derived from the type T and the size, you don't need to store it at all.
  • This is just an extra complicated way of writing Box<[T]>, right?
pub struct HeapArray<T> { /* note: Sized is a default bound */
    inner: Box<[T]>,
}

impl<T: Default> HeapArray<T> {
    pub fn new(size: usize) -> HeapArray<T> {
        let mut inner = Vec::new();
        inner.resize_with(size, Default::default);
        HeapArray {
            inner: inner.into_boxed_slice(),
        }
    }
}

impl<T> HeapArray<T> {
    pub fn size(&self) -> usize {
        self.inner.len()
    }
}

/* no need to write a Drop impl, Box takes care of it */

Why are you avoiding Vec?

4 Likes

Thanks for that ! Didn't think about using Box tbh...

I'm not using Vec because :

  • I want to save some of the space from it.
  • Because I wanted to understand better how memory managment works in rust
  • Because I needed what basically is a non-resizable Vec

One last problem :
I want to implement Deref and DerefMut for my array :

use std::ops::{Deref, DerefMut};

pub struct HeapArray<T> {
    inner: Box<[T]>
}

impl<T: Default> HeapArray<T> {
    pub fn new(size: usize) -> HeapArray<T> {
        let mut inner = Vec::new();
        inner.resize_with(size, Default::default);
        HeapArray {
            inner: inner.into_boxed_slice(),
        }
    }

    pub fn size(&self) -> usize {
        self.inner.len()
    }
}

impl<T> Deref for HeapArray<T> {
    type Target = &[T];

    fn deref(&self) -> Self::Target {
        self.inner.deref()
    }
}

but it tells me it needs a lifetime here

type Target = &[T];
               ^
               |

If I try to add that lifetime to the impl like this

impl<'a, T> 

It tells me

   |
21 | impl<'a, T> Deref for HeapArray<T> {
   |      ^^ unconstrained lifetime parameter

I’m pretty sure that should be (untested):

impl<T> Deref for HeapArray<T> {
    type Target = [T];

    fn deref(&self) -> &Self::Target {
        self.inner.deref()
    }
}
1 Like

Oh yeah right thanks

No.

The default() method is implemented by a user and may panic midway through, leaving you with a half-initialized buffer. Because the items are considered owned, their destructors will be executed so we'll try to destroy garbage.

Something to keep in mind with unsafe code is that it's not good enough to have UB only for a "short" time (i.e. having uninitialized data in something that is meant to be initialized, then initializing it in the next line). Once you trigger UB your program is broken and may cause the optimiser/code generator to make bad assumptions leading to unexpected behaviour.

You can't use assignment for initializing the array element. When you move an object into an array it'll call the destructor for the original. Instead, you should call write() on a raw pointer (e.g. array_iter.as_ptr().offset(i).write(T::default())).

I also prefer to use array.as_ptr().offset(i) for retrieving a raw pointer to the i'th item instead of &mut array[i] as *mut T. The &mut array[i] bit means you're technically taking a mutable reference to uninitialized data, and references assume the data they point to is initialized. It still works and miri probably won't complain, but feels a little wrong to me.

Have a look at the alloc::raw_vec::RawVec from the standard library. It's the resizable buffer type that Vec<T> builds on.

I'd also recommend reading the Example: Implementing Vec chapters from The Nomicon. They go through all the steps you'd take to implement your own Vec<T>, including managing the memory, resizing, and implementing useful patterns (e.g. drain() and Deref).

I'm curious to hear how you got into this situation. As has already been mentioned, you aren't really saving any memory by coding your own Vec<T>. Even ignoring that you are storing the size twice (size field, and the size built into the slice's fat pointer).

Well, right now, my struct is 2 * 8 bytes (a box with a slice in it). When I looked into the Vec definition, it had a bit more.

A Vec<T> should be 3 words in size, which is one usize (the capacity) more than the boxed slice.

fn main() {
    let word = mem::size_of::<usize>();

    assert_eq!(mem::size_of::<Vec<u8>>(), 3 * word);
    assert_eq!(mem::size_of::<Box<[u8]>>(), 2 * word);
    assert_eq!(mem::size_of::<HeapArray<u8>>(), 2 * word);
    assert_eq!(mem::size_of::<InitialHeapArray<u8>>(), 3 * word);
    let layout_plus_padding = word;
    assert_eq!(
        mem::size_of::<InitialHeapArrayWithLayout<u8>>(),
        4 * word + layout_plus_padding
    );

    // size doesn't change depending on T's size

    assert_eq!(mem::size_of::<Vec<String>>(), 3 * word);
    assert_eq!(mem::size_of::<Box<[String]>>(), 2 * word);
    assert_eq!(mem::size_of::<HeapArray<String>>(), 2 * word);
    assert_eq!(mem::size_of::<InitialHeapArray<String>>(), 3 * word);
    assert_eq!(
        mem::size_of::<InitialHeapArrayWithLayout<String>>(),
        4 * word + layout_plus_padding
    );
}

pub struct InitialHeapArray<T: Sized> {
    pub size: usize,
    location: *mut [T],
}

pub struct InitialHeapArrayWithLayout<T: Sized> {
    pub size: usize,
    layout: Layout,
    location: *mut [T],
}

pub struct HeapArray<T> {
    inner: Box<[T]>,
}

(playground)