Yes. =)
Below is the parallel version I had tried 2 days ago:
pub fn copy(c: &mut [f64], a: &[f64], n: usize) -> f64 {
let c_iter = c.par_chunks_mut(n);
let a_iter = a.par_chunks(n);
let s = Instant::now();
// Parallel version|(c_slice, a_slice)| {
Performance for this kernel is lower than the kernel in the 1st post.
Function | Rate(MB/s) | Rate(MFlop/s) | Avg time | Min time | Max time |
Copy: | 174110.97 | - | 0.0116 | 0.0110 | 0.0122 |