Generating f64x4 from [f64;4] crashes the program

ishan · March 10, 2022, 9:07am

Hey there!
I am using packed_simd_2 0.3.7 and latest version of rand.

I have this program,

use packed_simd::f64x4;
use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();

    // This works fine!
    println!("{:?}", f64x4::from_slice_aligned(&rng.gen::<[f64; 4]>()));

    // This crashes the program
    println!("{:?}", gen_random(&mut rng));
}

#[derive(Debug)]
pub struct Vec3(f64x4);

fn gen_random<R: Rng + ?Sized>(rng: &mut R) -> Vec3 {
    Vec3(f64x4::from_slice_aligned(&rng.gen::<[f64; 4]>()))
}

Running this program crashes with,

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `0`', /home/ishan/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd_2-0.3.7/src/v256.rs:66:1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I had this code in my raytracer and it was working fine until maybe an year ago. I tried to run it again today and now it just crashes with this error. Changing from f64x4::from_slice_aligned to f64x4::from_slice_unaligned also "fixes" the problem.

There is this at the specified line in the stack trace,

impl_f!([f64; 4]: f64x4, m64x4 | f64 | test_v256 | x0, x1, x2, x3 |
        From: i8x4, u8x4, i16x4, u16x4, i32x4, u32x4, f32x4 |
        /// A 256-bit vector with 4 `f64` lanes.
);

This is the from_slice_aligned function.

        #[inline]
        pub fn from_slice_aligned(slice: &[f64]) -> Self {
            unsafe {
                if !(slice.len() >= 4) {
                    ::core::panicking::panic("assertion failed: slice.len() >= 4")
                };
                let target_ptr = slice.get_unchecked(0) as *const f64;
                {
                    match (&target_ptr.align_offset(crate::mem::align_of::<Self>()), &0) {
                        (left_val, right_val) => {
                            if !(*left_val == *right_val) {
                                let kind = ::core::panicking::AssertKind::Eq;
                                ::core::panicking::assert_failed(
                                    kind,
                                    &*left_val,
                                    &*right_val,
                                    ::core::option::Option::None,
                                );
                            }
                        }
                    }
                };
                Self::from_slice_aligned_unchecked(slice)
            }
        }

So, This is where it checks that things are aligned properly and is likely the assert that's failing here. I am not sure why/how it was working fine a year ago and only now has started acting up ? Does any one know how do I fix this problem?

H2CO3 · March 10, 2022, 9:47am

The alignment of the vector type is likely larger than the alignment of a regular slice. The very point of a vector type is that it will be operated on in one go, at the hardware level. This usually means that it has to be aligned at a larger multiple of the size in order to be loaded/stored efficiently.

I can't verify this hypothesis because packed_simd isn't in the Playground, but you should print align_of::<f64>() as well as align_of::<f64x4>().

alice · March 10, 2022, 9:51am

It sounds like the actual fix is to just use from_slice_unaligned.

ishan · March 10, 2022, 9:54am

from_slice_unaligned is slightly slower.

With no SIMD it renders the scene in ~60s
With SIMD, from_slice_unaligned, It renders the scene in ~45s
With SIMD, from_slice_aligned it renders in ~42s

ishan · March 10, 2022, 9:55am

I have also been able to replicate it with just this,

use packed_simd::f64x4;
use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();

    // This works fine!
    println!("{:?}", f64x4::from_slice_aligned(&rng.gen::<[f64; 4]>()));

    // This crashes the program
    println!("{:?}", gen_random(&mut rng));
}

fn gen_random<R: Rng + ?Sized>(rng: &mut R) -> f64x4 {
    f64x4::from_slice_aligned(&rng.gen::<[f64; 4]>())
}

H2CO3 · March 10, 2022, 9:57am

Well, if the slice is not sufficiently aligned, then you have to copy the data. There's no way around that. That's the very point of providing both methods. Use aligned when you can, and unaligned when you must.

ishan · March 10, 2022, 9:58am

[src/main.rs:5] std::mem::align_of::<f64>() = 8
[src/main.rs:6] std::mem::align_of::<f64x4>() = 32

alice · March 10, 2022, 9:58am

You could try something like this:

#[repr(align(32))]
struct AlignedArray {
    array: [f64; 4]
}

fn gen_random<R: Rng + ?Sized>(rng: &mut R) -> f64x4 {
    let array = AlignedArray {
        array: rng.gen::<[f64; 4]>(),
    };
    f64x4::from_slice_aligned(&array.array)
}

Hyeonu · March 10, 2022, 9:59am

To use aligned method you need to manually align the variable's address. Since it takes cost only benchmark can tell it's an improvement.

#[repr(align(32))]
struct Aligned([f64; 4]);

let v = f64x4::from_slice_aligned(&Aligned(rand::random()).0);

H2CO3 · March 10, 2022, 9:59am

There you go, then that's exactly what is happening.

ishan · March 10, 2022, 10:05am

Thank you @alice and @Hyeonu
The suggested fix does make this fix the problem.

Although, I still don't fully understand why it just suddenly break like this? Last time I worked in this project was in April 2021 and back then it was working just fine.

I'll also check how much time does it take to run my program with this fix.

alice · March 10, 2022, 10:07am

The compiler will choose some address on the stack for each of your variables. This choice is more-or-less random, but since an f64 has an alignment of 8, the address is guaranteed to be divisible by 8.

Unfortunately, your code works only if that address is divisible by 32. It will be sometimes depending on how exactly the compiler decided to lay out the code, but there's certainly no guarantee that it will be every time, nor is there any guarantee that this is consistent from compile to compile.

scottmcm · March 10, 2022, 10:09am

Note that you should probably just move away from packed_simd. The modern thing is f64x4 in core::simd - Rust.

ishan · March 10, 2022, 10:21am

I will try to switch it to this. Thank you for the suggestion!

ishan · March 10, 2022, 10:22am

After doing some unscientific benchmarks of my programs, (5 runs each)

from_slice_unaligned takes 45-46s
The suggested fix here takes 44s

So, It's just slightly faster

alice · March 10, 2022, 10:25am

If you want to actually compare them, then you should be using a proper benchmarking library which will tell you the variance too so that you can compare whether the difference is statistically significant. It might not be in your case.

system · June 8, 2022, 10:26am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Memory alignment for vectorized code help	17	2092	April 4, 2021
How to Properly Align a Boxed Slice? help	3	1483	September 19, 2020
Slice, array and vector alignment (bug?) help	5	330	September 12, 2024
Allocate * mut f32 on multiple of 4kb	5	827	July 12, 2021
Read_unaligned of packed vector in a struct crashes program help	12	830	February 5, 2022

Generating f64x4 from [f64;4] crashes the program

Related topics