The problem I have with this implementation is, that this doesn't play well with auto-vectorization and calling it takes up around half the time of my benchmark. The compiler simply unrolls the loop to 32 calls to this function per iteration as long as there are at least 32 remaining elements to iterate over and then uses a loop with a single call per iteration for the remaining 31 or fewer elements. Therefore, I'd like to vectorize the exp function, but for that, I need to find the source code. Where can I find it?
I've heard that such intrinsics utilize LLVM intrinsics.
If I understand correctly, they are not purely written in Rust, but LLVM is responsible to generate machine codes for them.
And additionally, LLVM uses LLVM IR (before generating native instructions).
So, such floating-point operations could be compiled not to straightforward native instructions, but to function calls to functions in libm or such math library.
Thank you for the information. This turns out to be more complicated than I thought. That means different operating systems on the same architecture may yield different results for the same argument. In return, if I want the same result for the same architecture, regardless of OS, I'd have to provide both the vectorized and scalar version for the exp function.