Disable unsupported SIMD code generation

Problem

I've written some SIMD intrinsics code targeting x86-64: SSE4.1, AVX, and AVX2. I began reading some of the assembly output, and came across a troubling result. When I made the mistake of not explicitly enabling AVX2 but only AVX, the compiler generated weird code sequences for AVX2 intrinsics.

This is very problematic since code like this may go unnoticed unless the actual assembly is inspected with every version. Is there any way to disable or protect against this code generation?

Example

See that no_avx2 below still compiles, but to a very slow version of yes_avx2. One expects _mm256_and_si256 to compile to nearly a single vandps instruction.

Code:

use std::arch::x86_64::*;

#[target_feature(enable = "avx2")]
pub unsafe fn yes_avx2(a: __m256i, b: __m256i) -> __m256i {
    _mm256_and_si256(a, b)
}

#[target_feature(enable = "avx")]
pub unsafe fn no_avx2(a: __m256i, b: __m256i) -> __m256i {
    _mm256_and_si256(a, b)
}

Assembly (compiler explorer with -C opt-level=3):

core::core_arch::x86::avx2::_mm256_and_si256::hda082db2f6c5855a:
        vmovaps (%rdx), %ymm0
        vandps  (%rsi), %ymm0, %ymm0
        vmovaps %ymm0, (%rdi)
        vzeroupper
        retq

example::no_avx2::h5e8ad84ba7048fcb:
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rbx
        andq    $-32, %rsp
        subq    $96, %rsp
        movq    %rdi, %rbx
        vmovaps (%rsi), %ymm0
        vmovaps %ymm0, (%rsp)
        vmovaps (%rdx), %ymm0
        vmovaps %ymm0, 32(%rsp)
        movq    %rsp, %rsi
        leaq    32(%rsp), %rdx
        vzeroupper
        callq   core::core_arch::x86::avx2::_mm256_and_si256::hda082db2f6c5855a
        movq    %rbx, %rax
        leaq    -8(%rbp), %rsp
        popq    %rbx
        popq    %rbp
        retq

example::yes_avx2::ha14dc1d4af094302:
        movq    %rdi, %rax
        vmovaps (%rdx), %ymm0
        vandps  (%rsi), %ymm0, %ymm0
        vmovaps %ymm0, (%rdi)
        vzeroupper
        retq

note, #[target_feature(enable = "xxx")] is NOT a conditional compilation flag. maybe you meant #[cfg(target_feature = "avx2")]?

what the #[target_feature] attribute does is that it tells the backend that this function can use special instructions when generating machine code.

in rust, functions with this attributes must be unsafe, and it is UB to call it from context that does not have the target feature.

in other words, either the calling function must also enable the same target feature, or the caller must do a runtime check before calling it.

_mm256_and_si256() is an avx2 intrinsic that would ALWAYS be compiled down to vandps, it is NOT a "portable" simd routine that gets conditionally compiled to vandps on avx2, and fallback to software emulation otherwise.

in your example, the the two functions differents only in that yes_avx2() inlined _mm256_and_si256(), while no_avx2() didn't. instead it called the function in libcore.

so your no_avx2() contains UB unless you enabled the -C target-feature=+avx2 compiler flag, in which case it would generate exactly the same code as yes_avx2().

unfortunately, portable SIMD has not landed on stable yet. see: