Writing to tokio File vs writing to std File via channel

I wanted to compare performance with regards to:

  • Using an tokio::fs::File and writing a bunch of data, split up into 64K blocks, to it.
  • Send the same amount of data (in 64K blocks) over an mpsc channel to a thread, which writes the data to an std::fs::File.

Basically I wanted to measure the thread pooling overhead vs passing the data over a channel to a thread. Using an unbounded channel is an issue, but not really relevant to what I wanted to check.

use std::{io::Write, thread, time::Instant};

use bytes::Bytes;

use tokio::io::AsyncWriteExt;

const NBLOCKS: usize = 64;
const BLOCK_SIZE: usize = 65536;

#[tokio::main]
async fn main() {
  tokio_write().await;
  stream_to_thread().await;
  tokio_write().await;
  stream_to_thread().await;
}

async fn tokio_write() {
  let mut f = tokio::fs::File::create("outfile.dat").await.unwrap();

  let start_time = Instant::now();
  for _idx in 0..NBLOCKS {
    let buf = vec![0u8; BLOCK_SIZE];
    f.write_all(&buf).await.unwrap();
  }
  f.flush().await;
  let dur = Instant::now() - start_time;

  println!(
    "tokio_write(), NBLOCKS={}, BLOCK_SIZE={}, runtime={:?}",
    NBLOCKS, BLOCK_SIZE, dur
  );
}

async fn stream_to_thread() {
  let f = tokio::fs::File::create("outfile.dat").await.unwrap();
  let mut f = f.into_std().await;

  let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<Bytes>();

  let start_time = Instant::now();

  // include kicking off writer thread in runtime
  let jh = thread::spawn(move || {
    while let Some(buf) = rx.blocking_recv() {
      f.write_all(&buf).unwrap();
    }
    f.flush();
    drop(f);
  });

  for _idx in 0..NBLOCKS {
    let buf = vec![0u8; BLOCK_SIZE];
    let buf = Bytes::from(buf);
    tx.send(buf).unwrap();
  }
  drop(tx);

  // include joining the writer thread in runtime
  let jh = tokio::task::spawn_blocking(|| jh.join());
  let _ = jh.await;

  let dur = Instant::now() - start_time;

  println!(
    "stream_to_thread(), NBLOCKS={}, BLOCK_SIZE={}, runtime={:?}",
    NBLOCKS, BLOCK_SIZE, dur
  );
}

Dependencies in Cargo.toml:

[dependencies]
bytes = { version = "1.5.0" }
tokio = { version = "1.36.0", features = [
  "fs", "io-util", "macros", "rt-multi-thread", "sync"
] }

Posting in case anyone else is curious; I got the answer I was looking for, so I'm not going to do more with it.

tokio_write(), NBLOCKS=64, BLOCK_SIZE=65536, runtime=2.549754ms
stream_to_thread(), NBLOCKS=64, BLOCK_SIZE=65536, runtime=1.416202ms
tokio_write(), NBLOCKS=64, BLOCK_SIZE=65536, runtime=2.104866ms
stream_to_thread(), NBLOCKS=64, BLOCK_SIZE=65536, runtime=1.242478ms
4 Likes

Those results make sense to me. With the background thread, you can send the next chunk to write before the previous one is written, so the background thread does not have to wait between each chunk. Also, your chunks are larger.

I will say though that for every day use, I'm going to use the regular async I/O functions -- it's fast enough. The reason I wrote this test now is that we're porting a real-time:ish C++ system to Rust and we need to shave off a few milliseconds (from the original C++ code) along a very specific code path.