Hi, I came across this weirdness while comparing std::io::copy
vs. tokio::io::copy
. I simply wanted to make copies of a large file (~600MB) asynchronously. I thought that since this is an I/O job, I should use tokio tasks so that it's more efficiently because the tasks can be swapped by as few threads as possible, compared to std::io::copy that may create more threads to do the copy synchronuously and parallely.
-
When I run the tokio::io::copy inside a linux container, I notice that the process was killed, I later ran htop and found that it was killed due to OOM. htop showed that it allocated about 8-10GB (about 10 tasks x file size), htop also shows that the the process spawned 2-3x threads than the tasks spawned. Why is this happening? Isn't tokio::io::copy has a fixed buffer and should only reuse the buffer size during copy?
-
When I run the std::io::copy inside a linux container, I notice that the process was didn't OOM or killed, htop showed very minimum CPU core spike and RAM allocation was only about 700MB max. The copies seem to be made parallelly. This is what I would expect from tokio::io::copy. Could someone help explain why this is the case? Thanks in advance!
Here's the code:
async fn run_tokio_tasks() {
for i in 0..10 {
let t = tokio::task::spawn(async move {
let mut from_file = tokio::fs::File::open("large_file.tgz").await.unwrap();
let mut file = tokio::fs::File::create(format!("large_file_{}.txt", i))
.await
.unwrap();
let _ = tokio::io::copy(&mut from_file, &mut file).await;
});
}
}
fn run_std_threads() {
for i in 0..10 {
let t = std::thread::spawn(move || {
let mut from_file = std::fs::File::open("large_file.tgz").unwrap();
let mut file = std::fs::File::create(format!("large_file_{}.txt", i)).unwrap();
let _ = std::io::copy(&mut from_file, &mut file);
});
}
}
#[tokio::main]
async fn main() {
let mut signal = signal(SignalKind::interrupt()).expect("Failed to register interrupt");
// std::thread::spawn(|| {
// run_std_threads()
// });
tokio::task::spawn(async move {
run_tokio_tasks().await
});
if let Some(_) = signal.recv().await {
println!("received cancel");
exit(0);
}
}