Allocate a boxed array of MaybeUninit

MikaelMayer · April 18, 2024, 9:50pm

For a code generator, I would like to allocate an array filled with MaybeUninit<T> for a given T.
I asked ChatGPT and it relunctantly provided me with the solution below:

     pub fn placebos_box_usize<T>(n_usize: usize) -> Box<[MaybeUninit<T>]> {
        // Calculate the layout for the array
        let layout = Layout::array::<MaybeUninit<T>>(n_usize).unwrap();

        // Allocate memory for the array without initializing it
        let ptr = unsafe { std::alloc::alloc(layout) } as *mut MaybeUninit<T>;

        // Convert the raw pointer into a boxed slice
        unsafe { Box::from_raw(std::slice::from_raw_parts_mut(ptr, n_usize) as *mut [MaybeUninit<T>]) }
    }

Is there a better way?

alice · April 18, 2024, 10:03pm

Your code is wrong because it's missing a check for allocation failures. It also handles zero-sized types (and n_size == 0) incorrectly, since alloc requires the layout to have a non-zero size.

Anyway, I might go for this:

use std::mem::MaybeUninit;

pub fn placebos_box_usize<T>(n_usize: usize) -> Box<[MaybeUninit<T>]> {
    let mut v = Vec::<MaybeUninit<T>>::with_capacity(n_usize);
    // SAFETY: Uninitialized memory is valid for the `MaybeUninit<T>` type,
    // and the capacity is at least n_usize.
    unsafe { v.set_len(n_usize) };
    v.into_boxed_slice()
}

It's possible that Vec::resize compiles down to the same as set_len since the compiler might be able to tell that writing uninitialized bytes is a no-op, but I haven't checked.

RustyYato · April 18, 2024, 10:58pm

As wild as it is, the safe version using resize/resize_with generates better code than set_len.
In fact, resize/resize_with generate identical code to the manual alloc version (when it takes care of zero-sized layouts)

old godbolt comparison

Nvm, it looks like there's ~~a regression~~ an optimization in nightly, so the set_len case gets ~~worse~~ longer, but there is now no stack usage, but the naive raw alloc code does have some stack usage. They all generate the same code.

godbolt comparison (stable)
godbolt comparison (nightly)

use std::{mem::MaybeUninit, alloc::{Layout, handle_alloc_error, alloc}};

pub fn v0(n: usize) -> Box<[MaybeUninit<u32>]> {
    let mut v = Vec::with_capacity(n);
    unsafe { v.set_len(n) }
    v.into_boxed_slice()
}

pub fn v1(n: usize) -> Box<[MaybeUninit<u32>]> {
    let mut v = Vec::with_capacity(n);
    v.resize_with(n, MaybeUninit::uninit);
    v.into_boxed_slice()
}

pub fn v2(n: usize) -> Box<[MaybeUninit<u32>]> {
    let mut v = Vec::with_capacity(n);
    v.resize(n, MaybeUninit::uninit());
    v.into_boxed_slice()
}

pub fn v3(n: usize) -> Box<[MaybeUninit<u32>]> {
    if n == 0 {
        return Box::new([])
    }

    let layout = Layout::array::<u32>(n).unwrap_or_else(|_| {
        unsafe { capacity_overflow() }
    });

    let ptr = unsafe { alloc(layout) };
    if ptr.is_null() {
        handle_alloc_error(layout)
    }

    unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr.cast(), n)) }
}

#[cold]
#[inline(never)]
fn capacity_overflow() -> ! {
    panic!()
}

After some more code-golf, this produces better code than any of the other solutions, and also has no stack usage. Not sure what's stopping the other versions.
I also made it generic, to fit the question better.

use std::{mem::MaybeUninit, alloc::{Layout, handle_alloc_error, alloc}};

pub fn v4<T>(n: usize) -> Box<[MaybeUninit<T>]> {
    if core::mem::size_of::<T>() == 0 {
        let ptr = std::ptr::NonNull::<T>::dangling().as_ptr();
        return unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr, n)) }
    }

    if n == 0 {
        return Box::new([])
    }

    let layout = Layout::array::<T>(n).unwrap_or_else(|_| {
        handle_error(None)
    });

    let ptr = unsafe { alloc(layout) };
    if ptr.is_null() {
        handle_error(Some(layout))
    }

    unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr.cast(), n)) }
}

#[cold]
#[inline(never)]
fn handle_error(layout: Option<Layout>) -> ! {
    panic!()
}

scottmcm · April 19, 2024, 1:34am

The answer for how you make a boxed slice is essentially always to go through Vec. (Or .collect(), which uses Vec internally.)

So you might want to just go for something simple like

fn make_uninit<T>(n: usize) -> Box<[MaybeUninit<T>]> {
    std::iter::repeat_with(MaybeUninit::uninit)
        .take(n)
        .collect()
}

Which looks like it optimizes plenty well enough: https://rust.godbolt.org/z/85sGnc75v

paramagnetic · April 19, 2024, 6:48am

+1, this should be the answer. No unnecessary unsafe, simple, communicates the intent, and the generated code seems reasonable, too (no bloat, the assembly is actually shorter than the elaborate unsafe variants above).

MikaelMayer · April 19, 2024, 3:27pm

I am not fluent in assembly, can you confirm that this solution is optimized to the point there is no unnecessary loop? If so, I'll mark it as the accepted answer

farnz · April 19, 2024, 4:37pm

All of the jumps are forward jumps (jump instructions begin with a j, like jne or jmp, and are followed by the destination label to jump to). Since there are no backwards jumps, there are no loops at all.

scottmcm · April 19, 2024, 5:03pm

Another way to see it:

If you change the demo to https://rust.godbolt.org/z/7xrjPK461

use std::mem::MaybeUninit;
fn make_uninit<T>(n: usize) -> Box<[MaybeUninit<T>]> {
    std::iter::repeat_with(MaybeUninit::uninit)
        .take(n)
        .collect()
}
//                          vvvvvvvvvvvvvvvvvvvv
#[no_mangle] pub fn demo(n: std::num::NonZeroU16) -> Box<[MaybeUninit<i32>]> {
    make_uninit(n.get().into())
}

Then that length restriction from the NonZeroU16 means that it doesn't need to worry about n: 0 (where it can't call alloc at all) nor things like n: usize::MAX (where the calculations overflow), and thus it ends up as just

demo:
        push    r14
        push    rbx
        push    rax
        movzx   ebx, di
        lea     r14d, [4*rbx]
        mov     rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
        movzx   eax, byte ptr [rax]
        mov     esi, 4
        mov     rdi, r14
        call    qword ptr [rip + __rust_alloc@GOTPCREL]
        test    rax, rax
        je      .LBB0_2                ; <-- the only branch
        mov     rdx, rbx
        add     rsp, 8
        pop     rbx
        pop     r14
        ret
.LBB0_2:
        mov     edi, 4
        mov     rsi, r14
        call    qword ptr [rip + alloc::raw_vec::handle_error::h60079937aa96c3ee@GOTPCREL]

where the only branch is checking whether the allocation failed -- definitely no loops.

Topic		Replies	Views
`MaybeUninit` with arrays on stable, is this code sound? help	6	554	September 16, 2021
MaybeUninit vs Option	8	1944	December 13, 2019
Using MaybeUninit with arrays	10	3254	January 12, 2023
Using MaybeUninit with a generic parameter help	4	420	September 3, 2022
Uninitialized array	5	5636	January 16, 2021

Allocate a boxed array of MaybeUninit

Related Topics