Allocate a boxed array of MaybeUninit

For a code generator, I would like to allocate an array filled with MaybeUninit<T> for a given T.
I asked ChatGPT and it relunctantly provided me with the solution below:

     pub fn placebos_box_usize<T>(n_usize: usize) -> Box<[MaybeUninit<T>]> {
        // Calculate the layout for the array
        let layout = Layout::array::<MaybeUninit<T>>(n_usize).unwrap();

        // Allocate memory for the array without initializing it
        let ptr = unsafe { std::alloc::alloc(layout) } as *mut MaybeUninit<T>;

        // Convert the raw pointer into a boxed slice
        unsafe { Box::from_raw(std::slice::from_raw_parts_mut(ptr, n_usize) as *mut [MaybeUninit<T>]) }
    }

Is there a better way?

Your code is wrong because it's missing a check for allocation failures. It also handles zero-sized types (and n_size == 0) incorrectly, since alloc requires the layout to have a non-zero size.

Anyway, I might go for this:

use std::mem::MaybeUninit;

pub fn placebos_box_usize<T>(n_usize: usize) -> Box<[MaybeUninit<T>]> {
    let mut v = Vec::<MaybeUninit<T>>::with_capacity(n_usize);
    // SAFETY: Uninitialized memory is valid for the `MaybeUninit<T>` type,
    // and the capacity is at least n_usize.
    unsafe { v.set_len(n_usize) };
    v.into_boxed_slice()
}

It's possible that Vec::resize compiles down to the same as set_len since the compiler might be able to tell that writing uninitialized bytes is a no-op, but I haven't checked.

4 Likes

As wild as it is, the safe version using resize/resize_with generates better code than set_len.
In fact, resize/resize_with generate identical code to the manual alloc version (when it takes care of zero-sized layouts)

old godbolt comparison

Nvm, it looks like there's a regression an optimization in nightly, so the set_len case gets worse longer, but there is now no stack usage, but the naive raw alloc code does have some stack usage. They all generate the same code.

godbolt comparison (stable)
godbolt comparison (nightly)

use std::{mem::MaybeUninit, alloc::{Layout, handle_alloc_error, alloc}};

pub fn v0(n: usize) -> Box<[MaybeUninit<u32>]> {
    let mut v = Vec::with_capacity(n);
    unsafe { v.set_len(n) }
    v.into_boxed_slice()
}

pub fn v1(n: usize) -> Box<[MaybeUninit<u32>]> {
    let mut v = Vec::with_capacity(n);
    v.resize_with(n, MaybeUninit::uninit);
    v.into_boxed_slice()
}

pub fn v2(n: usize) -> Box<[MaybeUninit<u32>]> {
    let mut v = Vec::with_capacity(n);
    v.resize(n, MaybeUninit::uninit());
    v.into_boxed_slice()
}

pub fn v3(n: usize) -> Box<[MaybeUninit<u32>]> {
    if n == 0 {
        return Box::new([])
    }

    let layout = Layout::array::<u32>(n).unwrap_or_else(|_| {
        unsafe { capacity_overflow() }
    });

    let ptr = unsafe { alloc(layout) };
    if ptr.is_null() {
        handle_alloc_error(layout)
    }

    unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr.cast(), n)) }
}

#[cold]
#[inline(never)]
fn capacity_overflow() -> ! {
    panic!()
}

After some more code-golf, this produces better code than any of the other solutions, and also has no stack usage. Not sure what's stopping the other versions.
I also made it generic, to fit the question better.

use std::{mem::MaybeUninit, alloc::{Layout, handle_alloc_error, alloc}};

pub fn v4<T>(n: usize) -> Box<[MaybeUninit<T>]> {
    if core::mem::size_of::<T>() == 0 {
        let ptr = std::ptr::NonNull::<T>::dangling().as_ptr();
        return unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr, n)) }
    }

    if n == 0 {
        return Box::new([])
    }

    let layout = Layout::array::<T>(n).unwrap_or_else(|_| {
        handle_error(None)
    });

    let ptr = unsafe { alloc(layout) };
    if ptr.is_null() {
        handle_error(Some(layout))
    }

    unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr.cast(), n)) }
}

#[cold]
#[inline(never)]
fn handle_error(layout: Option<Layout>) -> ! {
    panic!()
}
1 Like

The answer for how you make a boxed slice is essentially always to go through Vec. (Or .collect(), which uses Vec internally.)

So you might want to just go for something simple like

fn make_uninit<T>(n: usize) -> Box<[MaybeUninit<T>]> {
    std::iter::repeat_with(MaybeUninit::uninit)
        .take(n)
        .collect()
}

Which looks like it optimizes plenty well enough: https://rust.godbolt.org/z/85sGnc75v

7 Likes

+1, this should be the answer. No unnecessary unsafe, simple, communicates the intent, and the generated code seems reasonable, too (no bloat, the assembly is actually shorter than the elaborate unsafe variants above).

1 Like

I am not fluent in assembly, can you confirm that this solution is optimized to the point there is no unnecessary loop? If so, I'll mark it as the accepted answer

All of the jumps are forward jumps (jump instructions begin with a j, like jne or jmp, and are followed by the destination label to jump to). Since there are no backwards jumps, there are no loops at all.

4 Likes

Another way to see it:

If you change the demo to https://rust.godbolt.org/z/7xrjPK461

use std::mem::MaybeUninit;
fn make_uninit<T>(n: usize) -> Box<[MaybeUninit<T>]> {
    std::iter::repeat_with(MaybeUninit::uninit)
        .take(n)
        .collect()
}
//                          vvvvvvvvvvvvvvvvvvvv
#[no_mangle] pub fn demo(n: std::num::NonZeroU16) -> Box<[MaybeUninit<i32>]> {
    make_uninit(n.get().into())
}

Then that length restriction from the NonZeroU16 means that it doesn't need to worry about n: 0 (where it can't call alloc at all) nor things like n: usize::MAX (where the calculations overflow), and thus it ends up as just

demo:
        push    r14
        push    rbx
        push    rax
        movzx   ebx, di
        lea     r14d, [4*rbx]
        mov     rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
        movzx   eax, byte ptr [rax]
        mov     esi, 4
        mov     rdi, r14
        call    qword ptr [rip + __rust_alloc@GOTPCREL]
        test    rax, rax
        je      .LBB0_2                ; <-- the only branch
        mov     rdx, rbx
        add     rsp, 8
        pop     rbx
        pop     r14
        ret
.LBB0_2:
        mov     edi, 4
        mov     rsi, r14
        call    qword ptr [rip + alloc::raw_vec::handle_error::h60079937aa96c3ee@GOTPCREL]

where the only branch is checking whether the allocation failed -- definitely no loops.

5 Likes