Tokio: using core threads for cpu-heavy computation

tczajka · October 29, 2022, 7:22pm

I have a computation that most of the time is CPU heavy. There are dependencies between tasks, where some tasks sometimes wait for partial results from other tasks.

It seems async is perfect for this. If the runtime uses as many threads as there are cpu cores, it should normally max out all the cores (except if every task is waiting on a bottleneck). I don't want or need to run more threads than the number of cores.

However tokio docs suggest using spawn_blocking that launches more threads, or using a separate thread pool. This seems unnecessary for my use case.

Is there any reason not to just use the core tokio threads for this?

I guess the issue might be that the cpu-heavy tasks can delay I/O, but my I/O is rare and I don't really care.

jbe · October 29, 2022, 7:38pm

I think if a "normal" Tokio thread blocks, then some I/O operations may hang.

simonbuchan · October 29, 2022, 7:43pm

Basically, tokio can't predict that you're about to spend the next two minutes spinning in the current thread, so it has no way to know it needs to push your other work to another thread, or that it needs to start another one. This might not be a problem in practice for your current program, on your current computer, but it can easily lead to deadlocks if you're unlucky. For example, try running with the behavior on a two core CPU, like you might get on a cloud platform:

#[tokio::main(flavor = "multi_thread", worker_threads = 1)]

By default, tokio starts (core count - 1) threads for non blocking work, so you can quite easily block yourself in this situation any time there is a dependency (await).

You can resolve this by making liberal use of yield_now in tokio::task - Rust inside the main loop of your CPU bound work to "unstick" the runtime. It's quite possible that's an improvement to performance over using spawn_blocking, but probably not by much.

There's lots more detail if you're interested, but it's getting really into the weeds.

tczajka · October 29, 2022, 7:46pm

Can you elaborate? Why would I deadlock myself on await? If a task is awaiting on something, wouldn't tokio always use the thread to run some other task that is not awaiting?

jbe · October 29, 2022, 8:01pm

Based on @simonbuchan's worker_threads = 1, I made the following example:

#[tokio::main(flavor = "multi_thread", worker_threads = 1)]
async fn main() {
    tokio::task::spawn(async move {
    // Compare with:
    //tokio::task::spawn_blocking(move || {
        println!("Going to sleep!");
        std::thread::sleep(std::time::Duration::from_secs(5));
        println!("Waking up!");
    });
    println!("Some small task ...");
    tokio::time::sleep(std::time::Duration::from_millis(100)).await;
    println!("... may take a long time now.");
}

(Playground)

tczajka · October 29, 2022, 8:10pm

Yes I understand that, but I'm interested in overall throughput. So it's OK if a simple task takes a long time if something else is worked on instead.

I know that the scheduling order might sometimes be sub-optimal, but I don't see how it can lead to a deadlock.

I don't think that launching additional worker threads really solves the problem of optimal ordering (the optimal order depends on future dependencies -- maybe it's better to finish the longer computation first...).

tczajka · October 29, 2022, 8:22pm

There is a theorem bounding total computation time on N processors regardless of the order in which you schedule tasks: Brent's law.

As long as tokio isn't wasting threads unnecessarily, it should satisfy that theorem. In other words, if at some point there are N cores (I would presumably set tokio threads = N), and M tasks that are not awaiting on anything, I would hope that min(N, M) threads will be busy doing work.

I don't see why this couldn't be true. Is this actually true for the tokio scheduler?

alice · October 29, 2022, 8:55pm

You certainly can run CPU heavy things in a Tokio runtime, but it generally only makes sense if you only use the runtime for CPU heavy things. Putting any network IO on the same runtime would be a pretty bad idea, since your IO would frequently end up pausing for a long time. At this point I usually ask why you wouldn't just use rayon instead, considering that rayon also provides one thread per CPU core, but it has become clear that there is demand for using Tokio in this way, even if I don't really understand why.

Regarding spawn_blocking, I actually point out that it is unsuited for CPU heavy stuff in my article on blocking. It is designed for blocking IO.

tczajka · October 29, 2022, 9:03pm

Right. OK that makes sense and matches my understanding.

I was considering a setup where I have two tokio runtimes, one for the IO (maybe with even only 1 thread), the other for the computations with N threads. This is even mentioned in tokio docs. But given that my IO really will only takes some tiny amount of time (< 1%), it seems like an unnecessary complication.

Would rayon allow me to await for something from a task B in the middle of the computation of a task A? That's the whole point of using async for this for me. I want A to be able to start early before B has all the data I need, but at some point in the middle it will need some input from B.

alice · October 29, 2022, 9:05pm

Using two runtimes with IO on just one is certainly possible.

As for the communication between tasks, not if it has to happen via channels. It is true that async lets you do certain things more easily.

alice · October 29, 2022, 9:07pm

Another pattern I have seen sometimes is to orchestrate the blocking work in the runtime by spawning tasks off for each piece of work, having a single-threaded runtime similar to your IO one move around data and spawn tasks as necessary.

tczajka · October 29, 2022, 9:15pm

Do you mean creating your own runtime instead of using tokio?

I don't really need async IO at all, so maybe I don't need tokio. Could as well be blocking IO on a dedicated thread. I just want to be able to use async await in the middle of a computation to wait for something from another computation.

alice · October 29, 2022, 9:16pm

No, I meant to do it within Tokio.

Your use-case sounds reasonable enough.

tczajka · October 29, 2022, 9:26pm

Ah OK I get it, instead of two Tokio runtimes, have a thread that does I/O and spawns computational tasks, and all the computational tasks are scheduled by Tokio and run by the threads it manages. That probably makes perfect sense for my case, thanks.

simonbuchan · October 29, 2022, 9:39pm

The deadlock is when the awaited task is completed, but tokio had all it's workers starved, so it has no thread to put the continuation on. The simple case is when the CPU bound task unblocks the awaiting task but didn't ever yeild.

Using OS threading means the work is guaranteed to be fairly shared, so it's a lot safer.

tczajka · October 29, 2022, 9:42pm

By "workers are starved" do you mean that all the worker threads are busy doing computations? That would be progress being made at maximum speed, not a deadlock.

alice · October 29, 2022, 9:43pm

CPU-heavy computation on a Tokio runtime generally doesn't result in deadlocks, assuming that no operation continues running forever. At most, the operation is merely postponed until other work finishes.

tczajka · October 29, 2022, 9:47pm

The only thing I would be worried about is if I have 2 threads A and B, and 2 tasks X and Y that are unblocked and can make progress, but Tokio somehow has already decided to put both X and Y on thread A, with only X making progress, while thread B is idle.

But I am guessing Tokio is not going to do this though, and will always put Y on thread B in that case rather than leaving a thread idle?

alice · October 29, 2022, 9:49pm

Work stealing will make sure that no thread is idle. That said, you may want to disable the lifo slot as discussed here.

Topic		Replies	Views
Why shouldn't I use Tokio for my High-CPU workloads?	27	8997	August 14, 2021
Tokio Tasks for CPU-bound ops	3	1343	December 4, 2023
Don't fully understand tokio multithreaded runtime benefits help	17	6878	June 28, 2024
CPU intensive Tokio tasks end up on the same thread help	5	2301	March 26, 2021
Is it possible to launch this worker in a separate thread instead of blocking the main one? help	13	818	March 24, 2024

Tokio: using core threads for cpu-heavy computation

Related topics