Which one is better, three Vec<f64> or one Vec<[f64; 3]>?

Hi, I am writing some math functions that involve handing the x-y-z coordinates of a large number of points. I wonder if it is better to use three Vec<f64> or to use one Vec<[f64; 3]> to store and iterate over these coordinates data, considering performance first, then storage efficiency and finally easiness for code writing. Using Vec<[f64; 3]> seems to give the compiler more information about the length 3, but I don't know if it is wise to nest many arrays in a Vec. Or is there any better solution? Thanks!

It depends™

You might be interested in AoS and SoA - Wikipedia

11 Likes

Ok, I read this and found three Vec should be "sturcture of arrays", and one Vec<[f64; 3]> should be "array of structures". It seems that sturcture of arrays is more efficient regarding memory throughput, while Vec<[f64; 3]> may be more intuitive for code writers. Besides, it may be also beneficial to use [Vec; 3] where the length of Vec multiplying 3 fits native SIMD instructions, but that's may be too complex and the compiler may have already implemented auto-SIMD vectorizations.

Below is my example. Vec<[f64; 3]> is easier to write for me. Using three Vec I need to write some convenience macros to iterate 3 arrays at the same time.

#[macro_export]
macro_rules! elem {
    ($x:tt) => {
        ($x)
    };
    ($x:tt, $y:tt) => {
        ($x, $y)
    };
    ($x:tt, $y:tt, $($rest:tt),+) => {
        ($x, elem!($y, $($rest),+))
    };
}

#[macro_export]
macro_rules! mzip {
    ($x:tt) => {
        ($x)
    };
    ($x:expr, $y:expr) => {
        std::iter::zip($x, $y)
    };
    ($x:expr, $y:expr, $($rest:expr),+) => {
        std::iter::zip($x, mzip!($y, $($rest),+))
    };
}

#[derive(Default)]
pub struct CoordSysShift {
    pub shift_x: f64,
    pub shift_y: f64,
    pub shift_z: f64,
}


impl CoordSysShift
{
    #[inline]
    pub fn shift_array_of_structures( &self, xyz_arr: &mut [[f64; 3]] )
    {
        for [x, y, z] in xyz_arr.iter_mut() {
            *x -= self.shift_x;
            *y -= self.shift_y;
            *z -= self.shift_z;
        }
    }
    
    #[inline]
    pub fn shift_structure_of_arrays( &self, x_arr: &mut [f64], y_arr: &mut [f64], z_arr: &mut [f64] )
    {
        for elem!(x, y, z) in mzip!(x_arr.iter_mut(), y_arr.iter_mut(), z_arr.iter_mut()) {
            *x -= self.shift_x;
            *y -= self.shift_y;
            *z -= self.shift_z;
        }
    }
}

(Haven't read the linked Wikipedia article, but) this is a too simplistic conclusion. Vec<[f64; 3]> (and AoS in general) can have better performance. It depends on the access pattern. If you always access all xyz of a coordinate together, AoS will be faster due to better caching (this particular one should not be a problem for 3 values, as modern CPUs can predict cache access patterns across multiple streams), but there will also be less register pressure and easier loop control.

7 Likes

Sure it should look like this:
Vec<[f64; 3]>
Keeping x,y,z together seems the fastest to me and the most logical. Accessing 3 separate vecs will be 3 times slower I guess. I cannot think of a scenario where you need 3 vecs for x,y,z.

This would depend on just access patterns, but also on your platform.

AArch64 and RVV (RISC-V Vectors) have special instructions which help one to load sequence of RGBRGBRGB values into vector registers as RRR, GGG, BBB for processing and store the results as RGBRGBRGB, while on x86-64 you need additional shuffling in tight loops.

If you need to process three color planes separately.

There are algorithms which process them separately and also algorithms which process all three together, thus the best answer really is “it depends”.

At some point you would just need to benchmark.

2 Likes

True, it depends on the access and storage pattern.

Depending the processing and how large the "large number" is, you might want to look into more complex geometry aware structures such as an kd-tree that can keep nearby in space points close in memory. This is kind of a bottomless pit!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.