Hi, newcomer here. I'm experimenting with array permutation with Rust and learning the ingredients necessary to trigger auto-vectorization. I have taken extra care to ensure that all slices are of size 8.
This code vectorizes nicely. (PERM[0]
is [usize; 8]
).
pub fn permute_avx(marbles: &mut [u32]) {
let mut dst = [0u32; 8];
let arr = &mut marbles[..8];
dst.iter_mut()
.zip(PERM[0].iter())
.for_each(|(d, p)| *d = arr[*p]);
arr.copy_from_slice(&dst);
let arr = &mut marbles[8..16];
dst.iter_mut()
.zip(PERM[1].iter())
.for_each(|(d, p)| *d = arr[*p]);
arr.copy_from_slice(&dst);
}
However, if I were to collapse these into a for-loop, vectorization fails.
pub fn permute(marbles: &mut [u32]) {
let mut dst = [0u32; 8];
for i in 0..2 {
let arr = &mut marbles[8*i..8*(i+1)];
dst.iter_mut()
.zip(PERM[i].iter())
.for_each(|(d, p)| *d = arr[*p]);
arr.copy_from_slice(&dst);
}
}
The compiler notes that
note: optimization analysis for loop-vectorize at <source>:13:37: loop not vectorized: loop control flow is not understood by vectorizer
I have been searching for an explanation but could not find any.
Also, is it possible for cargo
to return these optimization notes? I have added --emit asm -g -C remark="slp-vectorizer loop-vectorize"
into my RUSTFLAGS
but still got nothing.
Any suggestions are highly appreciated. Thanks!