Tokio::io::copy slower than std io::copy

I did a file copy speed test between these 2 and seems tokio::io::copy is much slower than sync std io::copy
I noticed this while implementing my own tokio AsyncWrite.

Here are the results on my machine for a file of 894MB

filesize = 894MB
tokio write duration = 2.661018252s, speed MB/s 336.2394975276158
std write duration = 418.723487ms, speed MB/s 2136.826492286769

Here is the code I used, what am I doing something wrong? As I can't believe this to be correct :slight_smile:

use std::env::args;
use std::fs::File;
use std::future::Future;
use std::io::Write;
use std::io::{BufReader, BufWriter};
use std::path::Path;
use std::time::Instant;
use std::{fs, io};
use tokio::io::AsyncWriteExt;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut args = args();
    let _ = args.next(); // skip the program name
    let path_in = args.next().expect("path_in is missing");
    let path_out = format!(
        "/tmp/{}",
        Path::new(&path_in).file_name().unwrap().to_str().unwrap()
    );
    let out = Path::new(&path_out).to_path_buf();
    if out.exists() {
        fs::remove_file(&out)?;
    }

    let mut input = tokio::fs::File::open(path_in.clone()).await?;
    let size = input.metadata().await?.len();
    println!("filesize = {}MB", size / 1024 / 1024);
    let mut input = tokio::io::BufReader::new(input);
    speed_async(
        async {
            let mut out = tokio::io::BufReader::new(tokio::fs::File::create(&out).await?);
            tokio::io::copy(&mut input, &mut out).await?;
            out.flush().await?;
            Ok(())
        },
        "tokio write",
        size,
    )
    .await?;

    speed_async(
        async {
            let mut input = BufReader::new(File::open(path_in)?);
            let mut out = BufWriter::new(File::create(&out)?);
            io::copy(&mut input, &mut out)?;
            out.flush()?;
            Ok(())
        },
        "std write",
        size,
    )
    .await?;

    Ok(())
}

async fn speed_async<F>(f: F, label: &str, size: u64) -> anyhow::Result<()>
where
    F: Future<Output = anyhow::Result<()>>,
{
    let start = Instant::now();
    f.await?;
    let duration = start.elapsed();
    println!(
        "{label} duration = {:?}, speed MB/s {}",
        duration,
        (size as f64 / duration.as_secs_f64()) / 1024.0 / 1024.0
    );
    Ok(())
}

(Playground)

The thing that's slow is Tokio's file IO, not the copy function. From the Tokio tutorial:

When not to use Tokio

Reading a lot of files. Although it seems like Tokio would be useful for projects that simply need to read a lot of files, Tokio provides no advantage here compared to an ordinary threadpool. This is because operating systems generally do not provide asynchronous file APIs.

https://tokio.rs/tokio/tutorial

You want Tokio for network IO, but it doesn't help you for file IO.

9 Likes

This was cross-posted to reddit:

https://www.reddit.com/r/rust/comments/1cpphyx/tokioiocopy_slower_than_std_iocopy/

3 Likes

Note that the stdlib also internally uses specialization to make copy generally faster for BufReader/BufWriter. On linux and android it can even use syscalls like copy_file_range and splice to avoid loading the file data in userspace at all.

8 Likes

Well, that is changing (io-uring) and tokio doesn't seem to be keeping up with the development in the OS space here.

tokio's AsyncRead/AsyncWrite are incompatible with how io-uring work though. You can try tokio-uring instead.

tested writing 840 MB of content to a file with tokio-uring and it's a boost in speed

tokio write duration = 710.079182ms, speed MB/s 1260.0558679162832

Interesting that it is still significantly slower than the standard library. I would have expected them to be on par.

1 Like

There's nothing stopping Tokio from using io_uring for tokio::fs. We just need someone to actually take the time to implement it.

The traits force you to copy the data one more time than you have to with epoll. But we already need that copy for files due to spawn_blocking, so there's no issue there.

The correct way to copy files on Linux is to use the copy_file_range syscall, which std::io::copy is doing.

In contrast tokio performs 8KB-sized read and write syscalls, actively wasting time moving the entire file to userspace and back.

This is a performance bug in tokio and should be reported, I can't be arsed to do it though.

Employing io_uring to do these reads and writes faster is the wrong way to approach the problem.

1 Like

There's nothing Tokio can do. We need specialization to do that kind of thing, but it is an unstable language feature. The standard library can do it because it is special and can use unstable things.

4 Likes

Can you elaborate? I'm confused what the problem is, but I'm not a rust person.

problem in the original question or some comment?
former is the speed difference between those 2

I'm asking what the problem is with tokio using copy_file_range.

The tokio::io::copy function takes as arguments an AsyncRead and an AsyncWrite, so its implementation must be something that works for anything that implements those traits. There are many things that implement them, and most of them would not work with copy_file_range. There's a language feature called specialization that lets you override what a function does in specific cases, so we could use that to override the behavior when you pass it two files. However, specialization is an unstable feature so Tokio cannot use it (but the standard library can).

I have spent some time looking into workarounds for not having access to specialization (see this thread), but it's very difficult to do so.

10 Likes

Huh, sounds like a bummer.

Is tokio::io::copy known to be used for things other than files though? If there is a way to dig down to the 2 fds you could blindly pass them down to copy_file_range. Then you either have the copy made or the kernel tells you it can't do it. Not optimal for non-files as it sneaks in a syscall, but probably a worthwhile tradeoff until things can get fixed properly.

Again I know about squat about Rust, maybe these things are not even necessarily fds and it is not possible to determine that they are either (with legal means anyway).

As absolute minimum the 8K buf size should be bumped to 32K or higher.

Yes, tokio::io::copy is used with things that are not files - and even things that are not just file descriptors. For example, I've seen people use it for proxying over streams that use things like tls and compression, which means that the data being sent in userspace is not what goes over the wire.

We can bump buffer sizes, but in the particular case of files there are actually multiple buffers involved. Each file has an internal buffer for holding data as it gets transferred between the Tokio thread and the background threadpool that does the actual blocking IO, and then tokio::io::copy has another buffer for holding data as it gets copied from one to another.

Sounds pretty horrid.

I'm getting off your back.

Sounds like the answer to OP for the time being is to not use tokio for file copying.

Yeah, the lack of support for files in epoll and kqueue makes things suck pretty badly. The lack of copy_file_range is only a minor issue compared to the other reasons file IO sucks in async code.

2 Likes

The total difference in performance was bugging me, so I wrote a toy C program to just do the 8KB calls. It finishes in next to no time (0.02s user 0.90s system 99% cpu 0.923 total). On the other hand the program from OP with std::io::copy commented out takes 1.41s user 2.49s system 137% cpu 2.828 total, which is quite concerning.

From poking around with strace and perf I see tokio is using 2 threads to handle the op, resulting in a crazy number of futex calls and context switches. I presume using tokio-uring as mentioned in one of above comments manages to dodge this aspect.