SSIMD 0.1.0 : Simulated SIMD for Stable channel


#1

I have created a crate named ssimd (Simulated SIMD) with an effort to make SIMD work on the Stable channel. It tries to provide a similar API to crate simd developed on the nightly channel. The support SIMD operations are : arithmetic operations (+,-,*,/), bit operations (&, |, ^, !, <<, >>), comparison operations (>,<.<=,>=, min, max) and conversion operations among data type. Platform intrinsic instructions are not supported. To make SIMD work on stable channel, the well-know method is used : auto-vectorization. Normally we think successful vectorization is often a luck done by LLVM. However, by turning on the BB vectorizer, as described in detail in the crate’s document guide, I notice vectorization will be successful in most of the cases. Those who work with high performance computing applications are welcomed to give this a try.


#2

I’m not familiar with this kind of thing, but wouldn’t “those who work with high performance computing applications” want stronger guarantees that the code is being vectorized?


#3

Hi
That’s the work that the simd crate is supposed to do. However, since it has not been stable, the community still need to wait for a bit more.
My purpose is to create an API interface that is almost similar to that of the simd crate, so at the moment HPC application can use ssimd::* for stable release. When simd get stable, they just need to replace ssimd::* with simd::* and do not need any further change to their code.
And about how vectorization can be guaranteed, I have tested 7/8 examples from the simd crate (one of them cannot be ported to stable channel because it use platform-intrinsic instruction). Of the 7 examples, all of them have been successfully vectorized.

From my experience when working with auto-vectorization, we often need to check assembly code to see whether SIMD instructions have been generated. If not, we need to do some tuning to the code (simplify loop, function, etc) and try again. When BB vectorizer is enable, I see there is no need for such tuning and vectorization is often achieved at the first try.


#4

I think instead of simd crate it’s better to emulate stdsimd API:


#5

stdsim seems to be more updated, I may consider update my crate to emulate stdsim instead of simd, that will not be a problem because these two crates share many common interfaces.