Understanding performance loss while performing simple copy kernel operation

Yes. =)
Below is the parallel version I had tried 2 days ago:

pub fn copy(c: &mut [f64], a: &[f64], n: usize) -> f64 {
    let c_iter = c.par_chunks_mut(n);
    let a_iter = a.par_chunks(n);

    let s = Instant::now();

    // Parallel version
    c_iter.zip(a_iter).for_each(|(c_slice, a_slice)| {
        c_slice.copy_from_slice(a_slice);
    });

    s.elapsed().as_secs_f64()
}

Performance for this kernel is lower than the kernel in the 1st post.

----------------------------------------------------------------------------------------------------------
Function        | Rate(MB/s)      | Rate(MFlop/s)   | Avg time       | Min time        | Max time        |
----------------------------------------------------------------------------------------------------------
Copy:           | 174110.97       | -               | 0.0116         | 0.0110          | 0.0122          |
----------------------------------------------------------------------------------------------------------