Auto-vectorization fails in a for-loop

Hi, newcomer here. I'm experimenting with array permutation with Rust and learning the ingredients necessary to trigger auto-vectorization. I have taken extra care to ensure that all slices are of size 8.

This code vectorizes nicely. (PERM[0] is [usize; 8]).

pub fn permute_avx(marbles: &mut [u32]) {
    let mut dst = [0u32; 8];
    let arr = &mut marbles[..8];
    dst.iter_mut()
        .zip(PERM[0].iter())
        .for_each(|(d, p)| *d = arr[*p]);
    arr.copy_from_slice(&dst);

    let arr = &mut marbles[8..16];
    dst.iter_mut()
        .zip(PERM[1].iter())
        .for_each(|(d, p)| *d = arr[*p]);
    arr.copy_from_slice(&dst);
}

However, if I were to collapse these into a for-loop, vectorization fails.

pub fn permute(marbles: &mut [u32]) {
    let mut dst = [0u32; 8];
    for i in 0..2 {
        let arr = &mut marbles[8*i..8*(i+1)];
        dst.iter_mut()
            .zip(PERM[i].iter())
            .for_each(|(d, p)| *d = arr[*p]);
        arr.copy_from_slice(&dst);
    }   
}

Godbolt

The compiler notes that

note: optimization analysis for loop-vectorize at <source>:13:37: loop not vectorized: loop control flow is not understood by vectorizer

I have been searching for an explanation but could not find any.

Also, is it possible for cargo to return these optimization notes? I have added --emit asm -g -C remark="slp-vectorizer loop-vectorize" into my RUSTFLAGS but still got nothing.

Any suggestions are highly appreciated. Thanks!

Out of curiosity, how does it work with chunks_exact_mut ?

That is definitely more idiomatic and wow, it works!

Interestingly, it only works when I'm using enumerate instead of zip to index the PERM array-of-arrays.

Godbolt

Your PERM constant looks weird. In the second row you have two 8 indexes, which will always trigger a panic. Fixing it (also note that per each row you should not repeat indices) and assuring that code does not contain any panics (not necessary here, but I like that it removes unnecessary noise from assembly output) results in a desired assembly: Compiler Explorer

1 Like

Thank you! I copied the wrong array into the compiler explorer.

I didn't consider using an array reference before. That is a great help.

If you're on nightly, consider trying the const generic version: https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.as_chunks_mut

That way you'll get &mut [_; 8] instead of slices.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.