Optimization of Algorithm to Fill Empty Space

I spent some more time on it and found the mistake. Basically when using rayon initially we were applying the threads to the computation of elements inside an array in each time step. Due to communication between threads this would end up taking a lot of time.

Instead if one chose to let each thread calculate everything w.r.t. to one particle one would get a good performance boost. So in "pseudo-code" since I have changed the algorithm a bit:

pub fn packstep_s(data: &mut point::Particles,
                  ncut: usize) {
    let n = data.pvec.len();

    let mut pi_vec = data.pvec[0..(n-ncut)].to_vec();
    let mut ui_vec = data.uvec[0..(n-ncut)].to_vec();

    pi_vec
    .par_iter_mut()
    .zip(&mut ui_vec)
    .enumerate()
    .for_each(|(i,(p_ptr,u_ptr))|
        {
            let up_points = packstep_single(&data.pvec,&data.uvec,i);
            *p_ptr = up_points.ptmp;
            *u_ptr = up_points.utmp;
        }
    );

    // Only alters relevant indices
    data.pvec[..(n - ncut)].clone_from_slice(&pi_vec);
    data.uvec[..(n - ncut)].clone_from_slice(&ui_vec);
}

And running this gives a performance of 231 ms, which is roughly 4 times better than the 1000 ms gotten initially. This is without any of the neighbour search algorithms applied.

Kind regards

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.