I am trying to optimize my program by rewriting the linear search part with SIMD. However, it turns out that the SIMD implementation is 2x slower than naive while loop.
That really puzzled me, because I have also tested in C++ before, and the same SIMD version was 4x faster...
Apparently it doesn't make sense, but I am just unable to figure out the reason.
BTW: I checked the assembly generated by rustc, and it looks fine to me
One difference is that, in your Rust version, you allocate and initialize the vector within the timed section, which may be dwarfing the time it takes to search the vector. Try doing initialization before timing begins, like the C++ version.
How are you building the Rust code? AVX2 isn't enabled by default since not all CPUs support it. One way to enable it is with RUSTFLAGS='-C target-cpu=native' cargo build --release
@Response777 If you want to detect this problem more easily in the future, you may want to follow the suggestion of the core::arch module documentation to conditionally compile your SIMD functions only on CPU architectures where they are supported. This way, you'll get a clean compiler error when you forget to enable avx in your build instead of this kind of weird behavior.
That's actually why you want the cfgs -- that was the compiler telling you that the target-cpu was wrong. (Though admittedly the compiler does not say this very clearly.)