Thank you all for your help. I wasn't aware that the main thread was limited in how much stack memory was available.
Now, since @Neutron3529 was the only one that suggested a solution that didn't involve giving up on stack allocated arrays, I've marked his reply as the solution. Here's an updated code example that includes his solution, for those who might want a complete example:
use std::time::Instant;
use std::mem::size_of;
fn classic_array_fill<F, T, const N: usize>(a: &mut [T; N], a0: &T, f: F)
where
T: Copy,
F: Fn(T) -> T {
a[0] = *a0;
for n in 1..N {
a[n] = f(a[n-1]);
}
}
fn main() {
const N: usize = 1_000_000;
std::thread::Builder::new()
.stack_size(size_of::<f64>() * N)
.spawn(||{
// your code here
let a0: f64 = 0.5;
let before = Instant::now();
let mut a = [0_f64; N];
classic_array_fill(&mut a, &a0, |x| x * (1.0 - x));
let after = Instant::now();
println!("{:?}", a[N-1]);
println!("time elapsed: {:?}", after.duration_since(before));
}).unwrap().join().unwrap();
}
Although I am really puzzled by the overly verbose .unwrap().join().unwrap() that terminates the thread. Is it really necessary ?
That said, threads add a lot of overhead and so, @scottmcm's solution is the more performant in the present context. I've tested a few different implementations against each other with a very rudimentary benchmark:
https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=d22b54cdd23f618dfdcc6fd519063ba5
The numbers you can see in the playground seem to suggest that the "zero cost abstractions" promise does hold (no real difference between for loops or iterators), but I've noticed some systematic differences on my own laptop that seem to suggest that the performance crown goes to the traditional for loop over a pre-allocated Vec (classic_list_fill from the playground example).