Tokio await/Mutex too slow

zuston · August 11, 2023, 6:31am

Hi community

I'm developing a system built on the grpc tonic of tokio, which is a high-performance service to store spark shuffle data. However, when writing data with high concurrency by grpc request, a single grpc request takes a very long time, up to 30s +. This makes me confused.

After surfacing the tokio issue and dump the await-tree and the cpu profile, I found the slow occurs on the tokio::sync::Mutex . So I replace to the std::sync:mutex . And this looks fine.

But the root cause is not solved. I still found the await function invoking in tokio sometimes costs 1s (actually I log the internal execution, only cost 1ms). And I suspect the tokio schedule problem.

Test code is here: https://github.com/zuston/riffle/blob/838ab0a6485967c770ee6033f9a3adf97096b090/src/grpc.rs#L112-L179

Looking forward someone telling me some ways to dig this problem. BTW, I'm newbie for rust. If I'm wrong, feel free to point out.

DanielKeep · August 11, 2023, 6:40am

I can't offer any specific help with this, but I can do the standard check: are you sure you're testing an optimised --release build? By default, cargo produces unoptimised debug builds.

zuston · August 11, 2023, 6:43am

Yes. I compile this using cargo build --release --features memory-prof

alice · August 11, 2023, 7:25am

Although the Tokio mutex is slower than the std mutex, the difference is not that big. If you're measuring whole seconds, then the problem is something else.

zuston · August 11, 2023, 8:45am

Yes, do you have some ideas or directions to solve this?

simonbuchan · August 11, 2023, 10:56am

The tokio Mutex is actually just unscheduling the current task until another task releases the lock: the delay until lock succeeds is due to either another task simply holding it that long (often due it locking over too long a section of code), or you simply being out of worker threads to schedule the task on to continue after the .lock().await.

You should in theory be able to test which of these it is by increasing the tokio worker pool thread count (often by using something like #[tokio::main(worker_threads = 20)] - it defaults the your CPU count) or by also instrumenting the time from lock to unlock. For the first, you may need to ensure that you don't let any tasks run too long between .awaits, so tokio can service all your tasks quickly.

It may also be worth looking at GitHub - tokio-rs/console: a debugger for async rust! to debug complex issues, though it's still a little early.

slamb · August 12, 2023, 12:48am

I'd start with the general theory that the tokio threads aren't running my task because they're doing something else. There are a bunch of tools I might reach for to see what that something else is, including:

a wall clock profiler (e.g. a profiler configured to get a backtrace of every thread named tokio-* every 1/100th of a second).
custom code that gets a single snapshot of all these threads right when I know the thing is broken, e.g. a timer that I start right before .lock().await and cancel right after. I've written instrumentation like this but unfortunately it was in C++ and is buried in a previous employer's proprietary codebase. But it can be super valuable.
a cpu profiler might do, if the problematic thing is CPU-bound, but that's not guaranteed.
syscall-level stuff with eBPF or strace or the like, if the problem is something doing blocking syscalls from tokio threads.
tracing, although it relies on the problematic thing having been annotated with e.g. #[tracing::instrument].

zuston · August 13, 2023, 5:45am

I use the tokio console to dig and found something strange.

Two tasks are executing that occupied too much execution time.

alice · August 13, 2023, 7:15am

Ah, that is a very important clue. Check out this article: Async: What is blocking? – Alice Ryhl

When a task has a busy time that keeps increasing, but the number of polls is not increasing, then you are blocking the thread. The link explains what that means.

zuston · August 21, 2023, 6:17am

After using await-tree and tokio-console, I finally replace the tokio::sync::mutex with std mutex, it looks really fast. So for my test case, the performance loss is so huge when the high concurrent grpc request.

system · November 19, 2023, 6:17am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How can I find what is clogging my async server (tokio)? help	2	379	April 25, 2021
Why my version of tokio is sooo slow	4	2096	June 6, 2020
Tonic gRPC client with infinite synchronous loop help	4	1211	January 15, 2024
Weird contention in tokio::time:: Clock::now help	16	1332	March 9, 2022
Tokio is faster than async-std, but can I reduce overhead? help	2	1206	July 26, 2021

Tokio await/Mutex too slow

Related topics