The problem I have with this implementation is, that this doesn't play well with auto-vectorization and calling it takes up around half the time of my benchmark. The compiler simply unrolls the loop to 32 calls to this function per iteration as long as there are at least 32 remaining elements to iterate over and then uses a loop with a single call per iteration for the remaining 31 or fewer elements. Therefore, I'd like to vectorize the exp function, but for that, I need to find the source code. Where can I find it?
I've heard that such intrinsics utilize LLVM intrinsics.
If I understand correctly, they are not purely written in Rust, but LLVM is responsible to generate machine codes for them.
And additionally, LLVM uses LLVM IR (before generating native instructions).
So, such floating-point operations could be compiled not to straightforward native instructions, but to function calls to functions in libm or such math library.
Platform dependent.
For linux (with default build config), glibc would be used.
(I don't know what happens when I use musl as libc.)
For FreeBSD and macOS, other implementations would be used.
This comment (Japanese) says that msun seems to be used for FreeBSD, and Apple seems to have developed their own code for macOS.
Thank you for the information. This turns out to be more complicated than I thought. That means different operating systems on the same architecture may yield different results for the same argument. In return, if I want the same result for the same architecture, regardless of OS, I'd have to provide both the vectorized and scalar version for the exp function.
I did a bit of research and after considering my options, I decided to use packed_simd, for now. They actually offer an exp function, which I expect will work out nicely for me. I'll have to check the performance impact of the function, but I'm fairly optimistic.
packed_simd causes a segmentation fault during the benchmark. This is not what I expected. I actually think this is a nightly issue, now. I commented out the packed_simd code.