Random segfaults using simd intrinsics

usbalbin · October 12, 2018, 11:44am

Hello!

So I have been playing around a bit with the simd intrinsics and seem to have found something weird.

Debug mode is fine but whenever I run in release mode on my windows machine, the program almost instantly segfaults.

I have boiled it down to

println!("Setzero started");
let z = _mm256_setzero_ps();
println!("setzero done: {:?}, cast started", f32x8_to_array(z));
let i = _mm256_castps_si256(z);                     //<-- crash here unless
println!("cast done:");                             //<--- this line is removed
println!("   {:?}", i32x8_to_array(i));

if i just remove the line println!("cast done:"); then everything works fine.
This works perfectly on my linux machine and on playground

$ rustc --version
rustc 1.29.1 (b801ae664 2018-09-20)

$ rustc --version
rustc 1.31.0-nightly (77af31408 2018-10-11)

Windows machine with an Intel i5 3427U CPU which is supposed to support AVX

Is this most likely something wrong with my setup or could it be a compiler bug or is it just me doing something terribly wrong promming-wise?

Edit: oh sorry, forgot to mention the "crash" is a segfault

hellow · October 12, 2018, 11:53am

There was an issue I remember and it helped to add

#[cfg(target_arch = "x86_64")]

to the top of the function. Give it a try

usbalbin · October 12, 2018, 11:56am

Thank you but it does not seem to make any difference.

I put it on top on f() like

#[cfg(target_arch = "x86_64")]
fn f(){

still crashes unless println!("cast done:"); is removed

BurntSushi · October 12, 2018, 12:01pm

You're going to want to give this section of the docs a pretty thorough read: std::arch - Rust

In particular, I don't see any use of #[target_feature(enable = "avx2")], nor have you told us whether you're compiling with target features enabled.

hellow · October 12, 2018, 12:03pm

you i32x8_to_array function is wrong btw (not sure if that is the problem?!)

unsafe fn i32x8_to_array(x: __m256i) -> [f32; 8]

should be

unsafe fn i32x8_to_array(x: __m256i) -> [i32; 8]

(i32 instead of f32)

usbalbin · October 12, 2018, 12:19pm

Oh thank you very much! I had completely forgot about adding RUSTFLAGS='-C target-feature=+avx' . Guess I should add #[cfg(all(target_arch = "x86_64", target_feature = "avx"))] to the function signature then.

usbalbin · October 12, 2018, 12:24pm

Oh nice catch! The dangers of copy paste .

usbalbin · October 12, 2018, 12:35pm

Just out of curiosity, why does intrisics compile at all without sufficient features enabled? Why would it not make sense to have the intrinsics them selves behind #[cfg] s?

BurntSushi · October 12, 2018, 12:51pm

Because you generally want to use runtime feature detection instead of compile time feature detection. Take a look at the docs: std::arch - Rust

Runtime feature detection means you can compile and ship portable binaries that will take advantage of CPU specific features without needing to compile specifically for that CPU.

usbalbin · October 12, 2018, 1:08pm

Correct me if I am wrong but if you, like in the example, put

#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]

over all your functions with avx2 stuff. Does not that mean that you would be able to put _mm256_add_epi64 behind a #[target_feature(enable = "avx2")] and you would get slightly more compiletime guidance because avx2 intrinsics would only be available in those contexts?

I am sorry if I am sounding stubborn and rude. Just wish to understand the thoughts behind the design desicions so I can make good ones myself in the future

newpavlov · October 12, 2018, 1:08pm

I believe that considering desire for runtime detection it should be solved in a slightly different fashion, unfortunately this approach will require some overhaul of the building process, so I don't think that it will be implemented any time soon.

BurntSushi · October 12, 2018, 1:25pm

Sorry, I don't think I understand your question. The goal is to be able to compile CPU specific instructions into binaries that may run on CPUs that don't support those instructions. Programs must use CPU feature detection at runtime to dispatch between them.

What we have right now is approximately the minimum set of features needed to achieve that goal.

#[target_feature(enable = "avx2")] is not invoking conditional compilation. It is a directive applied to a function that tells the compiler to emit code with the given target feature enabled. It is then up to the caller to ensure that the target feature is enabled at runtime before calling that function. In particular, _mm256_add_epi64 is already behind #[target_feature(enable = "avx2")].

usbalbin · October 12, 2018, 2:06pm

Then how am I able to compile my test with intrinsics at first without any target_feature enabled? If I understand correctly, items with #[cfg(target_feature = "...")] are only compiled if that feature is indeed enabled. Wouldn't that make the corresponding intrinsics unavailable when compiled without their features? Thus I would have expected some sort of error

error[E0425]: cannot find function `_mm256_castps_si256` in this scope
  --> foo\bar.rs:y:x
   |
y  |     _mm256_castps_si256();
   |     ^^^^^^^^^^^^^^^^^^^ not found in this scope

when compiling my initial example without RUSTFLAGS='-C target-feature=+avx'

I hope this clarifies my question .

parched · October 12, 2018, 2:20pm

Hmm, I don't think this example should crash. I agree you ideally want avx enabled in the caller for optimal use of the intrinsic, but it shouldn't crash provided your cpu supports it. The docs dont seem to say the caller needs avx enabled.

BurntSushi · October 12, 2018, 2:27pm

#[cfg(target_feature = "avx2")] uses conditional compilation and compiles the tagged code only when you've told the compiler to compile the code with the avx2 target feature enabled (or with a specific CPU that is known to support it).

#[target_feature(enable = "avx2")] is not conditional compilation, but is instead a directive for telling the compiler to compile a function with a specific target feature enabled, regardless of any compile time settings.

The intrinsics are tagged with the latter, not the former, so they are always available to call, regardless of your specific CPU and regardless of compile time settings.

usbalbin · October 12, 2018, 3:30pm

Oh ok thanks, that makes sense. But then as @parched pointed out, shouldn't my first test work when run an AVX enabled CPU? Even though it might be very bad practice not to do the runtime check?

usbalbin · October 12, 2018, 3:48pm

I finally fixed the type error

hellow:

you i32x8_to_array function is wrong btw (not sure if that is the problem?!)
unsafe fn i32x8_to_array(x: __m256i) -> [f32; 8] 
should be
unsafe fn i32x8_to_array(x: __m256i) -> [i32; 8]
( i 32 instead of f32)

and it actually seems to have fixed to segfault. Although I am not sure why beacause f32 and i32 happens to have the same bit pattern for zero if I am not mistaken.

BurntSushi · October 12, 2018, 3:52pm

No, it's not clear to me that that is true. Without any compile time target features enabled and without any use of #[target_feature], your f function is not compiled for AVX2. However, it has AVX vectors in its stack frame and is calling other functions that require an AVX ABI. I'm not sure you have any guarantees here; it smells like UB to me.

usbalbin · October 12, 2018, 4:06pm

Oh ok. Sorry for taking so long to get it. So even having an AVX vector and passing it arround in my non #[target_feature] is UB. However the insides of the intrinsics are fine bacause they have #[target_feature] on them.

Thank you all so much for taking the time answer all my annoying questions and to help me with this!

Note: even with the f32 -> i32 fix mentioned earlier and no more segfaults, the function sometimes prints garbage values. So it is clearly UB.

usbalbin · October 12, 2018, 4:33pm

@newpavlov I read your proposal, I can not say that have enough knowledge to understand all of it. But if I understand it right it essentially makes it harder to do this kind of bad things? And easier to do the right thing? Seems nice!

Topic		Replies	Views
How to get good code-gen with core::arch intrinsics? help	34	1753	August 28, 2022
Why are x86 intrinsics not always inlined help	12	462	February 12, 2026
Disable unsupported SIMD code generation help	2	237	October 4, 2025
Library requiring simd help	5	558	September 3, 2019
Rust segfaults while using SIMD help	3	298	May 13, 2024

Random segfaults using simd intrinsics

Related topics