The code below runs successfully in release mode, but segfaults in debug mode. In debug mode, it can be fixed by adding a print statement (see comment)
use std::arch::x86_64::{_mm256_add_ps, _mm256_set_ps, _mm256_store_ps};
fn main() {
unsafe { f() };
}
unsafe fn f() {
let mut x = [0.0; 8];
let a = _mm256_set_ps(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0);
let b = _mm256_set_ps(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0);
let c = _mm256_add_ps(a, b);
// println!("{c:?}"); // we can avoid segfault by printing here
_mm256_store_ps(x.as_mut_ptr(), c);
println!("{x:?}");
}
_mm256_store_ps is meant to be a low-level primitive that allows you to avoid dropping directly to assembly language for SIMD, and is unsafe because there's a pile of preconditions you've got to comply with (but it's supposed to work with all sorts of interesting data structures as long as you meet the preconditions).
The type-safe SIMD stuff lives in std::simd, but it's not yet stable.