Generating f64x4 from [f64;4] crashes the program

Hey there!
I am using packed_simd_2 0.3.7 and latest version of rand.

I have this program,

use packed_simd::f64x4;
use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();

    // This works fine!
    println!("{:?}", f64x4::from_slice_aligned(&rng.gen::<[f64; 4]>()));

    // This crashes the program
    println!("{:?}", gen_random(&mut rng));
}

#[derive(Debug)]
pub struct Vec3(f64x4);

fn gen_random<R: Rng + ?Sized>(rng: &mut R) -> Vec3 {
    Vec3(f64x4::from_slice_aligned(&rng.gen::<[f64; 4]>()))
}

Running this program crashes with,

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `0`', /home/ishan/.cargo/registry/src/github.com-1ecc6299db9ec823/packed_simd_2-0.3.7/src/v256.rs:66:1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I had this code in my raytracer and it was working fine until maybe an year ago. I tried to run it again today and now it just crashes with this error. Changing from f64x4::from_slice_aligned to f64x4::from_slice_unaligned also "fixes" the problem.

There is this at the specified line in the stack trace,

impl_f!([f64; 4]: f64x4, m64x4 | f64 | test_v256 | x0, x1, x2, x3 |
        From: i8x4, u8x4, i16x4, u16x4, i32x4, u32x4, f32x4 |
        /// A 256-bit vector with 4 `f64` lanes.
);

This is the from_slice_aligned function.

        #[inline]
        pub fn from_slice_aligned(slice: &[f64]) -> Self {
            unsafe {
                if !(slice.len() >= 4) {
                    ::core::panicking::panic("assertion failed: slice.len() >= 4")
                };
                let target_ptr = slice.get_unchecked(0) as *const f64;
                {
                    match (&target_ptr.align_offset(crate::mem::align_of::<Self>()), &0) {
                        (left_val, right_val) => {
                            if !(*left_val == *right_val) {
                                let kind = ::core::panicking::AssertKind::Eq;
                                ::core::panicking::assert_failed(
                                    kind,
                                    &*left_val,
                                    &*right_val,
                                    ::core::option::Option::None,
                                );
                            }
                        }
                    }
                };
                Self::from_slice_aligned_unchecked(slice)
            }
        }

So, This is where it checks that things are aligned properly and is likely the assert that's failing here. I am not sure why/how it was working fine a year ago and only now has started acting up ? Does any one know how do I fix this problem?

The alignment of the vector type is likely larger than the alignment of a regular slice. The very point of a vector type is that it will be operated on in one go, at the hardware level. This usually means that it has to be aligned at a larger multiple of the size in order to be loaded/stored efficiently.

I can't verify this hypothesis because packed_simd isn't in the Playground, but you should print align_of::<f64>() as well as align_of::<f64x4>().

It sounds like the actual fix is to just use from_slice_unaligned.

2 Likes

from_slice_unaligned is slightly slower.

With no SIMD it renders the scene in ~60s
With SIMD, from_slice_unaligned, It renders the scene in ~45s
With SIMD, from_slice_aligned it renders in ~42s

I have also been able to replicate it with just this,

use packed_simd::f64x4;
use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();

    // This works fine!
    println!("{:?}", f64x4::from_slice_aligned(&rng.gen::<[f64; 4]>()));

    // This crashes the program
    println!("{:?}", gen_random(&mut rng));
}

fn gen_random<R: Rng + ?Sized>(rng: &mut R) -> f64x4 {
    f64x4::from_slice_aligned(&rng.gen::<[f64; 4]>())
}

Well, if the slice is not sufficiently aligned, then you have to copy the data. There's no way around that. That's the very point of providing both methods. Use aligned when you can, and unaligned when you must.

[src/main.rs:5] std::mem::align_of::<f64>() = 8
[src/main.rs:6] std::mem::align_of::<f64x4>() = 32

You could try something like this:

#[repr(align(32))]
struct AlignedArray {
    array: [f64; 4]
}

fn gen_random<R: Rng + ?Sized>(rng: &mut R) -> f64x4 {
    let array = AlignedArray {
        array: rng.gen::<[f64; 4]>(),
    };
    f64x4::from_slice_aligned(&array.array)
}
5 Likes

To use aligned method you need to manually align the variable's address. Since it takes cost only benchmark can tell it's an improvement.

#[repr(align(32))]
struct Aligned([f64; 4]);

let v = f64x4::from_slice_aligned(&Aligned(rand::random()).0);

There you go, then that's exactly what is happening.

Thank you @alice and @Hyeonu
The suggested fix does make this fix the problem.

Although, I still don't fully understand why it just suddenly break like this? Last time I worked in this project was in April 2021 and back then it was working just fine.

I'll also check how much time does it take to run my program with this fix.

The compiler will choose some address on the stack for each of your variables. This choice is more-or-less random, but since an f64 has an alignment of 8, the address is guaranteed to be divisible by 8.

Unfortunately, your code works only if that address is divisible by 32. It will be sometimes depending on how exactly the compiler decided to lay out the code, but there's certainly no guarantee that it will be every time, nor is there any guarantee that this is consistent from compile to compile.

5 Likes

Note that you should probably just move away from packed_simd. The modern thing is f64x4 in core::simd - Rust.

2 Likes

I will try to switch it to this. Thank you for the suggestion!

After doing some unscientific benchmarks of my programs, (5 runs each)

  1. from_slice_unaligned takes 45-46s
  2. The suggested fix here takes 44s

So, It's just slightly faster

If you want to actually compare them, then you should be using a proper benchmarking library which will tell you the variance too so that you can compare whether the difference is statistically significant. It might not be in your case.

5 Likes