For a code generator, I would like to allocate an array filled with MaybeUninit<T> for a given T.
I asked ChatGPT and it relunctantly provided me with the solution below:
pub fn placebos_box_usize<T>(n_usize: usize) -> Box<[MaybeUninit<T>]> {
// Calculate the layout for the array
let layout = Layout::array::<MaybeUninit<T>>(n_usize).unwrap();
// Allocate memory for the array without initializing it
let ptr = unsafe { std::alloc::alloc(layout) } as *mut MaybeUninit<T>;
// Convert the raw pointer into a boxed slice
unsafe { Box::from_raw(std::slice::from_raw_parts_mut(ptr, n_usize) as *mut [MaybeUninit<T>]) }
}
Your code is wrong because it's missing a check for allocation failures. It also handles zero-sized types (and n_size == 0) incorrectly, since alloc requires the layout to have a non-zero size.
Anyway, I might go for this:
use std::mem::MaybeUninit;
pub fn placebos_box_usize<T>(n_usize: usize) -> Box<[MaybeUninit<T>]> {
let mut v = Vec::<MaybeUninit<T>>::with_capacity(n_usize);
// SAFETY: Uninitialized memory is valid for the `MaybeUninit<T>` type,
// and the capacity is at least n_usize.
unsafe { v.set_len(n_usize) };
v.into_boxed_slice()
}
It's possible that Vec::resize compiles down to the same as set_len since the compiler might be able to tell that writing uninitialized bytes is a no-op, but I haven't checked.
As wild as it is, the safe version using resize/resize_with generates better code than set_len.
In fact, resize/resize_with generate identical code to the manual alloc version (when it takes care of zero-sized layouts)
Nvm, it looks like there's a regression an optimization in nightly, so the set_len case gets worse longer, but there is now no stack usage, but the naive raw alloc code does have some stack usage. They all generate the same code.
use std::{mem::MaybeUninit, alloc::{Layout, handle_alloc_error, alloc}};
pub fn v0(n: usize) -> Box<[MaybeUninit<u32>]> {
let mut v = Vec::with_capacity(n);
unsafe { v.set_len(n) }
v.into_boxed_slice()
}
pub fn v1(n: usize) -> Box<[MaybeUninit<u32>]> {
let mut v = Vec::with_capacity(n);
v.resize_with(n, MaybeUninit::uninit);
v.into_boxed_slice()
}
pub fn v2(n: usize) -> Box<[MaybeUninit<u32>]> {
let mut v = Vec::with_capacity(n);
v.resize(n, MaybeUninit::uninit());
v.into_boxed_slice()
}
pub fn v3(n: usize) -> Box<[MaybeUninit<u32>]> {
if n == 0 {
return Box::new([])
}
let layout = Layout::array::<u32>(n).unwrap_or_else(|_| {
unsafe { capacity_overflow() }
});
let ptr = unsafe { alloc(layout) };
if ptr.is_null() {
handle_alloc_error(layout)
}
unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr.cast(), n)) }
}
#[cold]
#[inline(never)]
fn capacity_overflow() -> ! {
panic!()
}
After some more code-golf, this produces better code than any of the other solutions, and also has no stack usage. Not sure what's stopping the other versions.
I also made it generic, to fit the question better.
use std::{mem::MaybeUninit, alloc::{Layout, handle_alloc_error, alloc}};
pub fn v4<T>(n: usize) -> Box<[MaybeUninit<T>]> {
if core::mem::size_of::<T>() == 0 {
let ptr = std::ptr::NonNull::<T>::dangling().as_ptr();
return unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr, n)) }
}
if n == 0 {
return Box::new([])
}
let layout = Layout::array::<T>(n).unwrap_or_else(|_| {
handle_error(None)
});
let ptr = unsafe { alloc(layout) };
if ptr.is_null() {
handle_error(Some(layout))
}
unsafe { Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr.cast(), n)) }
}
#[cold]
#[inline(never)]
fn handle_error(layout: Option<Layout>) -> ! {
panic!()
}
+1, this should be the answer. No unnecessary unsafe, simple, communicates the intent, and the generated code seems reasonable, too (no bloat, the assembly is actually shorter than the elaborate unsafe variants above).
I am not fluent in assembly, can you confirm that this solution is optimized to the point there is no unnecessary loop? If so, I'll mark it as the accepted answer
All of the jumps are forward jumps (jump instructions begin with a j, like jne or jmp, and are followed by the destination label to jump to). Since there are no backwards jumps, there are no loops at all.
Then that length restriction from the NonZeroU16 means that it doesn't need to worry about n: 0 (where it can't call alloc at all) nor things like n: usize::MAX (where the calculations overflow), and thus it ends up as just
demo:
push r14
push rbx
push rax
movzx ebx, di
lea r14d, [4*rbx]
mov rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
movzx eax, byte ptr [rax]
mov esi, 4
mov rdi, r14
call qword ptr [rip + __rust_alloc@GOTPCREL]
test rax, rax
je .LBB0_2 ; <-- the only branch
mov rdx, rbx
add rsp, 8
pop rbx
pop r14
ret
.LBB0_2:
mov edi, 4
mov rsi, r14
call qword ptr [rip + alloc::raw_vec::handle_error::h60079937aa96c3ee@GOTPCREL]
where the only branch is checking whether the allocation failed -- definitely no loops.