Why copying files in parallel is slower than sequential?

fn cpy_task(chnk: Vec<String>, path_to: String) {
    let start = std::time::Instant::now();
    for (current_inx, current_file) in chnk.iter().enumerate() {
        let current_file_name = PathBuf::from(current_file.clone());
        let current_file_name = current_file_name.file_name().unwrap().to_str().unwrap();
        let full_path_to = path_to.clone() + "/" + current_file_name;
    let duration = start.elapsed();
    println!("Copying finished:{}", duration.as_secs());

This is my parallel loop (assume that the number of threads is never more than number of processors - 1)

    let start_parallel = std::time::Instant::now();
                            let mut handles: Vec<std::thread::JoinHandle<()>> = Vec::new();
                            for i in paths_from {
                                let path_to_clone = path_to.clone();
                                let handle: std::thread::JoinHandle<()> = std::thread::spawn(move || {
                                    cpy_task(vec![i.clone()], path_to_clone);
                            for handle in handles {
    let duration_parallel = start_parallel.elapsed();
    println!("Copying in parallel finished:{}", duration_parallel.as_secs());

This above is slower or executes with similar speed to the sequential counter part:

cpy_task(paths_from, path_to);

Would anyone have idea why is it behaving in such counter intuitive way?


You are spawning a single thread for each file. If there are a lot of files, this will be very costly.

Yes, but I did point out that the number of threads is never more than number of processors - 1.

How big are the files? If they are small, there is much thread spawning overhead. If they are big, you will likely exhaust the disk bandwidth even with a single thread.


If the files are tiny then you won't see much better performance. You can try to time individual cpy_task() calls. Or introduce a 5s sleep to simulate slow io.

Files are big: >= 2.5GB

On Linux std::fs::copy uses copy_file_range to copy the whole file in chunks of up to 1GB in a single syscall.

OK, but how does this answer my question?

copy_file_range is so fast on big files that you are limited by the disk speed even when using a single thread. This means that any extra threads won't speed things up. In fact it will likely make it slower due to multiple threads fighting for the same resource (disk access). If you have a harddisk instead of ssd, you also have the problem that it the fighting will cause the disk to seek a lot between the files it is copying, which is very slow on harddisks.


Yes, I was suspecting that this can be the cause.

Using multiple threads can be beneficial when accessing separate physical devices, e.g.:

  • thread 1: copy from d1 to d1
  • thread 2: copy from d2 to d2

or even

  • thread 1: copy from d1 to d1
  • thread 2: copy from d1 to d2