So I spent some time today integrating SIMD support (which required to switch to nightly toolchain, unfortunately), just to find out that, surprisingly, this does not really provide a significant performance boost. Speedup is somewhere around 10%, and cross-checking with the C++ version yields approximately the same results (10-15% performance decrease with SIMD turned off). I suspect that in order to catch up with C++ it would be necessary to also use parallelization (which is achieved in the original by using OpenMP). Does anyone know, what’s considered hip in this regard in contemporary Rust?
Rayon is a popular choice, it mostly uses the same design principles as Intel TBB if you are familiar with that.
Thanks for the advice, parallel iterators seem to be exactly what’s needed