Why does tokio::io::copy consumes a lot of memory?

Hi, I came across this weirdness while comparing std::io::copy vs. tokio::io::copy. I simply wanted to make copies of a large file (~600MB) asynchronously. I thought that since this is an I/O job, I should use tokio tasks so that it's more efficiently because the tasks can be swapped by as few threads as possible, compared to std::io::copy that may create more threads to do the copy synchronuously and parallely.

  1. When I run the tokio::io::copy inside a linux container, I notice that the process was killed, I later ran htop and found that it was killed due to OOM. htop showed that it allocated about 8-10GB (about 10 tasks x file size), htop also shows that the the process spawned 2-3x threads than the tasks spawned. Why is this happening? Isn't tokio::io::copy has a fixed buffer and should only reuse the buffer size during copy?

  2. When I run the std::io::copy inside a linux container, I notice that the process was didn't OOM or killed, htop showed very minimum CPU core spike and RAM allocation was only about 700MB max. The copies seem to be made parallelly. This is what I would expect from tokio::io::copy. Could someone help explain why this is the case? Thanks in advance!

Here's the code:

async fn run_tokio_tasks() {
    for i in 0..10 {
        let t = tokio::task::spawn(async move {
            let mut from_file = tokio::fs::File::open("large_file.tgz").await.unwrap();
            let mut file = tokio::fs::File::create(format!("large_file_{}.txt", i))
                .await
                .unwrap();
            let _ = tokio::io::copy(&mut from_file, &mut file).await;
        });
    }
}

fn run_std_threads() {
     for i in 0..10 {
        let t = std::thread::spawn(move || {
            let mut from_file = std::fs::File::open("large_file.tgz").unwrap();
            let mut file = std::fs::File::create(format!("large_file_{}.txt", i)).unwrap();
            let _ = std::io::copy(&mut from_file, &mut file);
        });
    }
}

#[tokio::main]
async fn main() {
    let mut signal = signal(SignalKind::interrupt()).expect("Failed to register interrupt");
    // std::thread::spawn(|| {
    //     run_std_threads()
    // });
    tokio::task::spawn(async move {
        run_tokio_tasks().await
    });

    if let Some(_) = signal.recv().await {
        println!("received cancel");
        exit(0);
    }
}

I don't know the answers to your memory usage questions. However, note that Tokio or async more generally is not an automatic "everything is faster". See this recent thread:

2 Likes

It's very likely due to this Continuous memory leak with `console_subscriber` · Issue #184 · tokio-rs/console · GitHub. I don't see the memory spike anymore after uncommenting console_subscriber::init();. It appears the issue is closed since Jan 22, 2024 so the next release of console-subscriber should fix this issue. I'm currently on 0.2.0

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.