Weird contention in tokio::time:: Clock::now

I've been trying my app using async streams on a bigger machine with 24 cores. To my surprise, it runs slower than using one thread. If I add more threads, the app is running slower, almost proportionally to the number of threads, as if only one thread was really working, and the others just burning CPU in vain.

After removing most of the things from the tight loops, I ended up with this weird flamegraph.
I have totally no idea where this Clock thing might be coming from; I do use Instant::now in one place, but I verified in a separate project that should not be an issue.So I conclude there must be something else I missed.

Any hints what to look at? Why would tokio use this method and how can I check what in my program is calling that?

Tokio dev here. I haven't seen a framegraph like that before. Do you use Tokio's timers anywhere?

Maybe the "pthread_mutex_lock@@GLIBC_2.2.5" is a hint - suggests locking is a big expense ( and a mutex lock can be a relatively slow operation as in needs to access shared memory rather than local CPU cache ).

Yes, I use tokio::time::Instant::now() in a few places and just a few comparisons - mostly to check if the program should finish after given amount of time. But in a separate program I tested calling now() and it is plenty fast and works fine even when called from different tokio tasks in parallel.

Basically I'm trying to remove as many things as possible from the original program until it starts scaling properly, but I haven't managed to do it yet. On the other side I tried to replicate just the core of it in a fresh project but there it scales and works properly. I need to find what is different in those two versions.

Was wondering maybe this flamegraph would look familiar to someone.

Obviously the mutex is the most direct problem. But I'm not using any mutexes anywhere. Something must be calling it indirectly.

I'm more referring to other stuff in tokio::time such as the sleep method. Actually getting the current time isn't what this is about - it's somewhere inside the time driver.

Did you enable test-util? It complicates the time driver due to its time mocking capabilities.

Looks like now() needs a Mutex, and that's causing the contention. It might not cause contention in your other program, I guess it all depend on the scheduling etc. and whether a contention delay causes a traffic jam.

It's a workaround for the OS/hardware bugs on some platforms. On platforms known to not have such bugs the lock is not used.

4 Likes

Removed the last call to Instant::now() from the inner loop. Contention disappeared, scalability returned, pthread locking disappeared completely from the flamegraph. So at least now I know it is Instant::now() that's calling that Mutex.

Now looking for the reason why it happens in the original project, but not in the other one.
Both are compiled on the same machine, same Rust version, same config.

Replaced tokio = { version = "1.14", features = ["rt", "rt-multi-thread", "time"] } with ... features = ["full"] } and the Mutex is still being called, however this time it uses parking_lot mutex which is faster.

Replaced tokio::time with std::time - no issues, no locks, all scales fine.

Did you enable test-util ? It complicates the time driver due to its time mocking capabilities.

Only in dev-dependencies.

$ cargo tree | grep test-util
<empty>

Well, feel free to report this as a performance bug to Tokio. Preferably with a minimal example.

Well, the problem is I still have no idea what causes that - and I haven't reproduced it on a smaller example.

Look, this is what I'm getting in another program with I was hoping to reproduce the issue, where I copied the core of the original program:

Looks right. And I double checked I imported tokio::time here as well.
Do you have any more details about the logic that enables that "worse" mutex path in tokio?

Well I looked at the code, and it only hits the mutex if the list of pending timers is not empty.

So that might be something used by a dependency then, because my code doesn't use timers / delays / timeouts.

Could be.

This exact problem (or at least something that looked very similar!) happened to us when a dependency had a regular (non-dev-dependency) on tokio test-util. Feature flag unification caused that flag to be enabled for our release build. We tracked down the offending package using Features - The Cargo Book.

Certain architectures also have a mutex in std::time::Instant::now() to ensure monotonicity. If you don't see mutex contention with std::time then you're probably running on x86 or ARM where a mutex is not needed to ensure monotonicity (just fixed on ARM in the 1.57 release) .

Check whether cargo build -v shows --cfg 'feature="test-util"' in the tokio build. On resolver = "1" (used for edition 2015 and 2018 builds) your dev-dependencies will leak features into your dependencies. This can be fixed by adding resolver = "2" into your [package] config, or changing to edition 2021.

(Weirdly, cargo tree --edges=features doesn't show this leakage, so the only way to check it that I know of is looking at the exact build commands that happen).

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.