This was probably already asked before, but maybe I am too dumb to find it:
I want to create an array on the stack. This is how I would implement it in C:
uint8_t buffer[4000];
It will be filled later, so it doesn't matter that it is uninitialized.
It seems like in rust we have the option to initialize the array:
let x:[u8; 4000] = [0u8;4000];
But this should be slower than the C solution, right? So I found this:
let x:[u8; 4000] = unsafe{std::mem::uninitialized()};
But this is even slower, than the previous solution.
What is the fasted way to create an uninitialized array in rust?
The performance will probably seem identical because the code is significantly faster than the resolution of your PC's clock, and in a lot of cases LLVM will see the buffer is never used so we can optimise it out. Proper benchmarking frameworks have ways to deal with these sorts of things.
By looking at the generated assembly I noticed that LLVM won't elide the zeroing even when compiling in release mode (cargo build --release), and had to reach for MaybeUninit to create a buffer of uninitialized data.
use std::mem::MaybeUninit;
pub fn squares_zeroed() -> [usize; 1024] {
let mut buffer = [0; 1024];
for i in 0..buffer.len() {
buffer[i] = i*i;
}
buffer
}
pub fn squares_uninit() -> [usize; 1024] {
unsafe {
let mut buffer = MaybeUninit::uninit_array::<1024>();
for i in 0..buffer.len() {
buffer[i].write(i*i);
}
// SAFETY: The buffer is now initialized
std::mem::transmute(buffer)
}
}
The main difference between the two is that squares_array() calls memset() before the loop starts.
squares_zeroed calls memset and squares_uninit calls memcpy. That's why I would expect squares_uninit to be slower. I will try to run some usable benchmarks today. Before I ran it in debug mode.
Note that even with a good library like criterion, this is such a small thing that there's really nothing to measure. I would strongly suggest benchmarking whatever's using the 0-initialized or uninitialized array instead, since that's where the speed is relevant.
If you compare the assembly, I think it's clear that most of the time measurement you're seeing is overhead: Compiler Explorer
pub fn array_init<const LEN: usize>() -> [u8; LEN] {
let mut buffer = [0u8; LEN];
for i in 0..buffer.len() {
buffer[i] = (i*i) as u8;
}
buffer
}
pub fn array_uninit<const LEN: usize>() -> [MaybeUninit<u8>; LEN] {
let mut buffer = MaybeUninit::uninit_array();
for i in 0..buffer.len() {
buffer[i].write((i*i) as u8);
}
buffer
}
pub fn array_custom_init() -> [u8; 1024] {
let mut buffer:[MaybeUninit<u8>;1024] = MaybeUninit::uninit_array();
for i in 0..buffer.len() {
buffer[i].write((i*i) as u8);
}
unsafe{std::mem::transmute(buffer)}
I ran it for more or less your example and I get this:
array init 1024 time: [808.61 ns 809.44 ns 810.53 ns]
array uninit 1024 time: [607.87 ns 608.10 ns 608.35 ns]
array custom init 1024 time: [825.10 ns 825.39 ns 825.73 ns]