Tokio: using core threads for cpu-heavy computation

I have a computation that most of the time is CPU heavy. There are dependencies between tasks, where some tasks sometimes wait for partial results from other tasks.

It seems async is perfect for this. If the runtime uses as many threads as there are cpu cores, it should normally max out all the cores (except if every task is waiting on a bottleneck). I don't want or need to run more threads than the number of cores.

However tokio docs suggest using spawn_blocking that launches more threads, or using a separate thread pool. This seems unnecessary for my use case.

Is there any reason not to just use the core tokio threads for this?

I guess the issue might be that the cpu-heavy tasks can delay I/O, but my I/O is rare and I don't really care.

I think if a "normal" Tokio thread blocks, then some I/O operations may hang.

Basically, tokio can't predict that you're about to spend the next two minutes spinning in the current thread, so it has no way to know it needs to push your other work to another thread, or that it needs to start another one. This might not be a problem in practice for your current program, on your current computer, but it can easily lead to deadlocks if you're unlucky. For example, try running with the behavior on a two core CPU, like you might get on a cloud platform:

#[tokio::main(flavor = "multi_thread", worker_threads = 1)]

By default, tokio starts (core count - 1) threads for non blocking work, so you can quite easily block yourself in this situation any time there is a dependency (await).

You can resolve this by making liberal use of yield_now in tokio::task - Rust inside the main loop of your CPU bound work to "unstick" the runtime. It's quite possible that's an improvement to performance over using spawn_blocking, but probably not by much.

There's lots more detail if you're interested, but it's getting really into the weeds.

Can you elaborate? Why would I deadlock myself on await? If a task is awaiting on something, wouldn't tokio always use the thread to run some other task that is not awaiting?

Based on @simonbuchan's worker_threads = 1, I made the following example:

#[tokio::main(flavor = "multi_thread", worker_threads = 1)]
async fn main() {
    tokio::task::spawn(async move {
    // Compare with:
    //tokio::task::spawn_blocking(move || {
        println!("Going to sleep!");
        std::thread::sleep(std::time::Duration::from_secs(5));
        println!("Waking up!");
    });
    println!("Some small task ...");
    tokio::time::sleep(std::time::Duration::from_millis(100)).await;
    println!("... may take a long time now.");
}

(Playground)

Yes I understand that, but I'm interested in overall throughput. So it's OK if a simple task takes a long time if something else is worked on instead.

I know that the scheduling order might sometimes be sub-optimal, but I don't see how it can lead to a deadlock.

I don't think that launching additional worker threads really solves the problem of optimal ordering (the optimal order depends on future dependencies -- maybe it's better to finish the longer computation first...).

There is a theorem bounding total computation time on N processors regardless of the order in which you schedule tasks: Brent's law.

As long as tokio isn't wasting threads unnecessarily, it should satisfy that theorem. In other words, if at some point there are N cores (I would presumably set tokio threads = N), and M tasks that are not awaiting on anything, I would hope that min(N, M) threads will be busy doing work.

I don't see why this couldn't be true. Is this actually true for the tokio scheduler?

You certainly can run CPU heavy things in a Tokio runtime, but it generally only makes sense if you only use the runtime for CPU heavy things. Putting any network IO on the same runtime would be a pretty bad idea, since your IO would frequently end up pausing for a long time. At this point I usually ask why you wouldn't just use rayon instead, considering that rayon also provides one thread per CPU core, but it has become clear that there is demand for using Tokio in this way, even if I don't really understand why.

Regarding spawn_blocking, I actually point out that it is unsuited for CPU heavy stuff in my article on blocking. It is designed for blocking IO.

Right. OK that makes sense and matches my understanding.

I was considering a setup where I have two tokio runtimes, one for the IO (maybe with even only 1 thread), the other for the computations with N threads. This is even mentioned in tokio docs. But given that my IO really will only takes some tiny amount of time (< 1%), it seems like an unnecessary complication.

Would rayon allow me to await for something from a task B in the middle of the computation of a task A? That's the whole point of using async for this for me. I want A to be able to start early before B has all the data I need, but at some point in the middle it will need some input from B.

Using two runtimes with IO on just one is certainly possible.

As for the communication between tasks, not if it has to happen via channels. It is true that async lets you do certain things more easily.

Another pattern I have seen sometimes is to orchestrate the blocking work in the runtime by spawning tasks off for each piece of work, having a single-threaded runtime similar to your IO one move around data and spawn tasks as necessary.

Do you mean creating your own runtime instead of using tokio?

I don't really need async IO at all, so maybe I don't need tokio. Could as well be blocking IO on a dedicated thread. I just want to be able to use async await in the middle of a computation to wait for something from another computation.

No, I meant to do it within Tokio.

Your use-case sounds reasonable enough.

Ah OK I get it, instead of two Tokio runtimes, have a thread that does I/O and spawns computational tasks, and all the computational tasks are scheduled by Tokio and run by the threads it manages. That probably makes perfect sense for my case, thanks.

The deadlock is when the awaited task is completed, but tokio had all it's workers starved, so it has no thread to put the continuation on. The simple case is when the CPU bound task unblocks the awaiting task but didn't ever yeild.

Using OS threading means the work is guaranteed to be fairly shared, so it's a lot safer.

By "workers are starved" do you mean that all the worker threads are busy doing computations? That would be progress being made at maximum speed, not a deadlock.

1 Like

CPU-heavy computation on a Tokio runtime generally doesn't result in deadlocks, assuming that no operation continues running forever. At most, the operation is merely postponed until other work finishes.

The only thing I would be worried about is if I have 2 threads A and B, and 2 tasks X and Y that are unblocked and can make progress, but Tokio somehow has already decided to put both X and Y on thread A, with only X making progress, while thread B is idle.

But I am guessing Tokio is not going to do this though, and will always put Y on thread B in that case rather than leaving a thread idle?

Work stealing will make sure that no thread is idle. That said, you may want to disable the lifo slot as discussed here.

1 Like