Yes. =)
Below is the parallel version I had tried 2 days ago:
pub fn copy(c: &mut [f64], a: &[f64], n: usize) -> f64 {
let c_iter = c.par_chunks_mut(n);
let a_iter = a.par_chunks(n);
let s = Instant::now();
// Parallel version
c_iter.zip(a_iter).for_each(|(c_slice, a_slice)| {
c_slice.copy_from_slice(a_slice);
});
s.elapsed().as_secs_f64()
}
Performance for this kernel is lower than the kernel in the 1st post.
----------------------------------------------------------------------------------------------------------
Function | Rate(MB/s) | Rate(MFlop/s) | Avg time | Min time | Max time |
----------------------------------------------------------------------------------------------------------
Copy: | 174110.97 | - | 0.0116 | 0.0110 | 0.0122 |
----------------------------------------------------------------------------------------------------------