Why tokio single thread beat multi thread?

In the following code the single threaded configuration always outperform the multi threaded one. Any suggestion?

fn main() {
  for i in 1..=10 {
    asyncpool::<true>(i*10000);
  }
  for i in 1..=10 {
    asyncpool::<false>(i*10000);
  }
}

fn asyncpool<const S:bool>(n:usize) {
  let rt = 
  if S {
    Builder::new_current_thread().build().unwrap()
  } else {
    Builder::new_multi_thread().worker_threads(4).build().unwrap()
  };
  rt.block_on(async {
    let start = Instant::now();
    let f = (0..n).map(|n|{
      tokio::spawn(async {
      })
    });
    for f in f {
      f.await;
    }
    println!("time={:5} thread={} workload={}",start.elapsed().as_millis(),if S {"Single"}else{"Multi "},n);
  });
}
time=    8 thread=Single workload=10000
time=   16 thread=Single workload=20000
time=   23 thread=Single workload=30000
time=   24 thread=Single workload=40000
time=   37 thread=Single workload=50000
time=   33 thread=Single workload=60000
time=   59 thread=Single workload=70000
time=   43 thread=Single workload=80000
time=   44 thread=Single workload=90000
time=   48 thread=Single workload=100000
time=  438 thread=Multi  workload=10000
time=  684 thread=Multi  workload=20000
time= 1258 thread=Multi  workload=30000
time= 1643 thread=Multi  workload=40000
time= 1875 thread=Multi  workload=50000
time= 2163 thread=Multi  workload=60000
time= 3789 thread=Multi  workload=70000
time= 3894 thread=Multi  workload=80000
time= 3702 thread=Multi  workload=90000
time= 4510 thread=Multi  workload=100000

I would guess this is simply an accurate reflection of reality: if you have very little work to do per task, the overhead of scheduling those tasks between different threads outweighs the benefits of using multiple threads to do work.

If you had any CPU-bound work per task, then you'd get actual benefit from having multiple threads and that would outweigh the cost. If you had any IO-bound work per task, the waiting required for IO would make any costs of multiple threads negligible in comparison. However, if you have neither CPU-bound nor IO-bound tasks, a single threaded executor is going to be more efficient than a multi-threaded one.

If this matches your real workload, a single threaded executor is a better fit.

This is similar to how using rayon on very very small tasks significantly slows down code. The fault is not with tokio or rayon. The benefits of being multithreaded are simply only reaped when there is work that really benefits from being divided between threads, and there's always overhead.

8 Likes

thank you , when I added something real , the situation changed

tokio::spawn(async {
        tokio::fs::File::open("file").await;
      })

this is the data , but the catching up is not obvious, ideally , for the same work load , the ratio of multi thread / single thread should be 1/4 , so how should I do to approach this ratio? Choose larger file or open more files in the task?

time= 2602 thread=Single workload=10000
time= 4524 thread=Single workload=20000
time= 7434 thread=Single workload=30000
time=10209 thread=Single workload=40000
time=10240 thread=Single workload=50000
time=15983 thread=Single workload=60000
time=14512 thread=Single workload=70000
time=17890 thread=Single workload=80000
time=22275 thread=Single workload=90000
time=22542 thread=Single workload=100000
time= 2423 thread=Multi  workload=10000
time= 4408 thread=Multi  workload=20000
time= 6594 thread=Multi  workload=30000
time=13328 thread=Multi  workload=40000
time=10757 thread=Multi  workload=50000
time=13580 thread=Multi  workload=60000
time=14277 thread=Multi  workload=70000
time=21632 thread=Multi  workload=80000
time=25833 thread=Multi  workload=90000
time=25518 thread=Multi  workload=100000

tokio::fs really is just std::fs but runs on the background thread pool. It doesn't scale well to millions of concurrent operations while those tokio::net and tokio::sync happily do.

tokio-uring crate provides truely async fs operations for the linux. I believe it will eventually become the part of the tokio crate itself, but it's pretty young currently. I have no idea how's it going on windows, though I know it's possible with the IOCP(or RIO?). Maybe it just need some real people to work on it.

3 Likes

Agreed; but I'd want to say even with truly asynchronous file IO, I think there's still the bottleneck of the filesystem and ultimately your HDD / SDD.

If you're reading files from a filesystem, I think the overhead is multithreading should be negligible, but that just means it's going to be equivalent to single-threaded. Or maybe a bit better, as you've observed, if the OS can cache the contents of the file and copy them from memory rather than reading form the disk.

The fact that tokio is asynchronous already means you're not waiting on each IO operation to complete before you start the next one, even with only one executor thread. You don't need multiple threads to have an asynchronous runtime do multiple IO operations at the same time; that's what it's designed to do even with one thread.

The only time you're going to have scaling like you describe, 1/4 the time with 4 threads, is if you're doing CPU-bound work which tokio can't put into the background. And even then, that's not what tokio is designed for - it's designed for IO. So, I think the point of using 4 threads is mostly to eliminate a bottleneck if you happen to be doing a small amount of CPU-bound work. In the ideal situation, you're never going to see a 1/4 time speedup. You already do all the IO operations simultaneously when using 1 thread, using 4 threads is just so that you still have some left in case one gets temporarily stuck doing CPU work.

1 Like

I would like to summary by an example : make an assumption that

  1. IO device can complete 1000 ios per second
  2. tokio can poll 2000 ios per second

in this configuration , there is no need for multi thread ( because the poll capacity > IO capacity), if we change the IO device capacity to 3000 ios per second , this time , poll capacity < IO capacity , add one poll thread will help , and add one more will not help. This might be a rule to choose between single thread and multi thread.

I think in most situations it's more like tokio can poll 2,000,000 io ops per second. I do not know the exact number, but I know for a typical desktop or laptop is many, many orders of magnitude greater than the device's IO capabilities. If all you are doing is interacting with true async IO, I can almost certainly guarantee you will never need multiple threads. This may be different if you are on, say, a low-power CPU with a router's high-speed network ports, maybe? but even then I am not sure multiple threads would help.

Multiple threads help only if you also do some light CPU-bound work, like calculating hashes or some small calculations, or allocating large objects, in amidst the IO-bound work. Then, multiple threads will let you continue to do IO while there's one or more threads handling CPU-bound overhead. If you only had one thread, you could respond to much more IO than your device can handle, surely, but any CPU-bound overhead would block all IO-ops while it was happening. Multiple threads avoid this.

That's also likely where the speedup of multiple threads help when many threads read the same file: that file is cached by the OS, so the workload becomes a task of copying memory rather than reading from an IO device, and copying memory is CPU-bound rather than IO-bound.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.