I implemented something using AVX2 instructions.
I was surprised to see that it compiled without specifying any target. The performance however where
When specifying a cpu-native target, my program ran with the expected speed.
Do LLVM simd intrinsics have some kind of fixture ?
I’ve encountered it too. It happens because intrinsics use
#[target_feature(enable = "...")] (this is the reason why they marked as
unsafe) which unconditionally enable target feature for them. While I too find such behaviour surprising and somewhat undesired, it’s needed for intrinsics to work with runtime detection (so program compiled without target feature could switch to more optimized implementation based on runtime checks). Performance drop is observed because LLVM can’t inline across functions which enable different sets of target featues.
Ok, so if I understand correctly, it is not fixtures, the slow binary cannot run on platform tht do not handle AVX2 instructions. It is slow only because of inlining.