Why are x86 intrinsics not always inlined

marsfan · November 13, 2025, 6:25am

I was playing around with computing some CRC checksums using the x86 CRC32 instruction, and I noticed something in that surprised me. The compiler does not seem to inline calls to the _mm_crc32_u8 intrinsic.

I did a little bit more playing around, and it seems to do this for a few of the intrinsics. See the (somewhat contrived) example I've made here

This strikes me as odd, as these are things that tend to be in hot loops, so any it seems like inlining would be ideal.
What’s the reason for this behavior?

kpreid · November 13, 2025, 6:36am

I don’t have an answer for your question, but your link seems to be mangled (it has double-URL-escaped text) so the code won't compile without edits.

marsfan · November 13, 2025, 6:39am

Thanks for the heads up. I switched to using the short form of the link. Should work now.

nerditation · November 13, 2025, 6:57am

the documentation of the intrinsic says it is only ~~available~~ safe to call with target feature sse4.2, which is not enabled for the default x86_64 target.

in short, you have two options:

annotate the function that calls the intrinsic with a #[target_feature(enable = "sse4.2")] attribute, e.g.:
```
#[target_feature(enable = "sse4.2")]
fn compute_crc(data: &[u8]) -> u32 {
   //...
}
```
use the target-feature codegen option for rustc, e.g.:
```
$ rustc -C target-feature=+sse4.2 example.rs
```
godbolt link: https://godbolt.org/z/Wd18EeK7f

each intrinsic has a required target feature. check the documentation.

if your development machine has the same cpu as the deployment environment, a simple fix is to use the target-cpu=native codegen option:

$ rustc -C target-cpu=native ...

kornel · November 13, 2025, 11:00am

When compiling with Cargo, set

RUSTFLAGS="-Ctarget-cpu=x86-64-v3"

tczajka · November 13, 2025, 11:04am

But the code compiles, so apparently it is available. Why? Looks like the documentation is lying.

nerditation · November 13, 2025, 11:28am

that's how the #[target_feature(enable = "xxx")] attribute works. it is a per-function attribute which affects which features are enabled for the backend during codegen. however, calling such function requires unsafe if the caller does not have the same target features enabled, and will result in UB at runtime if the cpu didn't support the required feature.

note, it is different from a conditional compilation guard, which looks like #[cfg(target_feature = "yyy")]. I explained the difference in this previous post:

tczajka · November 13, 2025, 1:09pm

That's not the issue though. In our case there is the unsafe marker and the CPU does support the instruction. The problem is that the call isn't inlined and that the documentation seems wrong:

Available on (x86 or x86-64) and target feature sse4.2 and x86-64 only.

It says the function is not "available" which suggests the compiler will prevent you from calling it, and yet it doesn't. Perhaps the documentation should be changed from "available on [...] only" to something like "the function may cause UB unless [...] and will not get inlined unless [...]".

nerditation · November 13, 2025, 1:43pm

your cpu does support that, but you didn't tell rustc what your cpu is.

out of the box, rustc is configured conservatively ^[1] with the default target cpu for maximum compatibility.

if users want their program to be compiled for a specific cpu than the default one, it is up to the users to setup the correct compiler flags (or to annotate the source code with proper attributes).

the fact the call is not inlined is a consequence of the caller not having the required target feature enabled. I may be wrong, so don't quote me on this, but I think this is probably a limitation imposed by the LLVM codegen backend, not by rustc.

I agree the wording of the documentation can be improved. or at least, if should include a link to the documentation about the #[target_feature] attribute to give users more context.

personally, I would say it's over conservative, but it is what it is for now ↩︎

marsfan · November 14, 2025, 5:08am

the fact the call is not inlined is a consequence of the caller not having the required target feature enabled. I may be wrong, so don't quote me on this, but I think this is probably a limitation imposed by the LLVM codegen backend, not by rustc.

Did a quick test. This is indeed what’s going on. If you go to the link I provided in the original post, and add -C target-cpu=znver5 (which supports all AVX extensions), then everything gets inlined.

That's odd that the compiler does not error out when compiling unsupported features, and instead does not inline. What's the reason for that?

newpavlov · November 14, 2025, 5:18am

I would say historic reasons. Intrinsics support in the language is quite important for various areas and many wanted to see it ASAP, which has resulted in stabilization of a somewhat un-Rusty feature. I would love to see something like this, but even the relatively modest target features v1.1 proposal took 7+ years to implement and stabilize, so I don't have high hopes for significant improvements in this area in the near future.

Yeap, LLVM is unable to inline functions with different target features.

nerditation · November 14, 2025, 6:59am

you can detect target feature either at compile time, or at runtime, there are legitimate uses for both. let's take a hypothetical example.

suppose we have some algorithm foo, which can take advantage of special cpu instructions if available, but we also want our software to be working on hardware that lacks the special features, so we also have a slower version as a fallback.

there's two way we can deal with such use case:

select which version to call at compile time using conditional compilation:

#[cfg(target_feature = "xxx")]
fn foo() {
  // probably use intrinsics or inline assembly
}
#[cfg(not(target_feature = "xxx"))]
fn foo() {
  // no hardware acceleration, software emulated
}
fn main() {
  // note we don't need `unsafe`
  foo();
}

pros:
smaller binary, no runtime overhead;
cons:
need to produce multiple variants of binary files, and may crash at runtime if distributed to incompatible systems.

include both version in the program and select the suitable one to call at runtime:

#[target_feature(enable = "xxx")]
fn foo_accelerated() {
  // probably use intrinsics or inline assembly
}
fn foo_fallback() {
  // no hardware acceleration, software emulated
}
fn main() {
  if is_x86_feature_detected("xxx") {
    // even if the function itself is not `unsafe fn`, still need `unsafe` to call,
    // because it's UB if the feature is actually unavailable
    // SAFETY: guarded with runtime feature detection based on `CPUID`
    unsafe { foo_accelerated() };
  } else {
    foo_fallback();
  }
}

pros:
a single binary can be distributed to, and is compatible with, mutiple different cpu models
cons:
binary size is inevitably larger, (in theory) may have some runtime overhead

as for the reason why the call to a #[target_feature()] annotated function is not inlined, it's a limitation of LLVM, I'm pretty sure there's technical reasons, but I'm not into LLVM to know better. from the perspective of rustc, despite the requirement of an unsafe block, it's no different from a regular function call, it just emits a regular call operation in LLVM IR.

if you want a compile time error for unavailable intrinsics, the standard library would need to be conditional compiled (i.e. #[cfg(target_feature = "xxx")] instead of #[target_feature(enable = "xxx")] on the intrinsics). but the problem is, rust then would have to ship gazillions of prebuilt standard libraries for different target feature sets, which would lead to a combinatorial explosion (imagine libcore-sse, libcore-sse2, libcore-avx, libcore-avx2, libcore-sse,avx, libcore-sse2,avx, and so on and so on); or we have to give up pre-built libraries and just compile the standard library from source every time.

system · February 12, 2026, 6:59am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to get good code-gen with core::arch intrinsics? help	34	1683	August 28, 2022
Random segfaults using simd intrinsics	21	1440	January 12, 2023
Do SIMD intrinsics have fixtures?	3	617	January 12, 2023
Inlining non-generic call chains cross crates	4	449	May 26, 2021
How to prevent bad code genration while using intrinsics in rust help	6	817	September 14, 2023

Why are x86 intrinsics not always inlined

Related topics