Simple per-vec-element arithmetic calculations: Auto vectorized in x86, but not vectorized in ARM? Why does this happen, and how to solve it?

My algorithm has the following small section:

    for i in 0..vec_a.len() {
        unsafe {
            let pix_a = vec_a.get_unchecked_mut(i);
            let pix_b = vec_b.get_unchecked(i);
            *pix_a = ... some math operations around them ...
        }
    }

It auto-vectorizes well on x86. However, it does not on arm. Since it is run on mobile phones, I care about arm most (but not care about x86 that much). Therefore, I wonder what should I do?

Moreover, I have thought that such auto-vectorization is mostly independent of those architectures. After all, they are all simd, isn't it? So I am curious why it auto vectorizes on one architecture but not on another.

Use x86: Compiler Explorer
Use arm: Compiler Explorer

1 Like

Looking at the LLVM-IR, it doesn't seem to be vectorized on x86 either: https://godbolt.org/z/KPf9dqqWc -- no vector types to be seen.

2 Likes

@scottmcm Hmm I see things like %xmm1 registers. imho they are simd registers?

1 Like

This is likely a missed optimizations by LLVM if there is no difference in the LLVM IR

1 Like

@RustyYato Hmm so what should I do? Thanks!

The xmm registers are used for scalar and vector. Most (all?) SIMD instructions start with a v (for vector). The multiplies are using MULSS which is scalar

2 Likes

Ah, you are right!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.