for i in 0..vec_a.len() {
unsafe {
let pix_a = vec_a.get_unchecked_mut(i);
let pix_b = vec_b.get_unchecked(i);
*pix_a = ... some math operations around them ...
}
}
It auto-vectorizes well on x86. However, it does not on arm. Since it is run on mobile phones, I care about arm most (but not care about x86 that much). Therefore, I wonder what should I do?
Moreover, I have thought that such auto-vectorization is mostly independent of those architectures. After all, they are all simd, isn't it? So I am curious why it auto vectorizes on one architecture but not on another.
The xmm registers are used for scalar and vector. Most (all?) SIMD instructions start with a v (for vector). The multiplies are using MULSS which is scalar