Tokio::time:: Clock ::now lock contention

I have been looking into an issue where our system comes to a crawl under load. Attaching the flamegraph under load, this appears to be similar to the one discussed in Weird contention in tokio::time:: Clock::now. It basically ends up in the cfg_test_util! implementation of the Clock, and 70% time is waiting for the lock.

Just to be sure:

  1. Forked the tokio github (tag=1.19.2) and added some debug to verify this is the case
  2. Tried a temp fix where the cfg_test_util! Clock::now() just returns Instant::from_std(std::time::Instant::now()) without grabbing the lock (since the pause functionality is not needed anyways in the prod version). This does indeed solve the performance issue.

I also tried setting resolver = 2, but this didn't make a difference. This is probably because we are already using cargo 1.60.0 (d1fd9fe2c 2022-03-01), and cargo 1.50.0 was the switch over from resolver = 1, IIUC. The cargo tree -e features -i tokio output also matches.

Is something not being set up correctly to prevent the test code leakage? Please let me know if any other info is needed.

Thanks for your help!

It sounds like the solution is to not enable the test-util feature of Tokio for your production build.

1 Like

Thanks. I rechecked, the test-util feature is not specified anywhere in the code base. May be some transitive dependency, not sure. Also, I am unable to repro it in a standalone example.

I grabbed the cargo tree -e features -i tokio --charset ascii output. All the parts where the test-util shows up listed here.

   1 tokio v1.19.2
   2 |-- tokio feature "bytes"
....
2447 |-- tokio feature "once_cell"
2448 |   |-- tokio feature "process" (*)
2449 |   |-- tokio feature "rt"
2450 |   |   |-- actix-rt v2.7.0 (*)
2451 |   |   |-- <interal crate> (*)
2452 |   |   `-- tokio-test v0.4.2 (*)
2453 |   |   |-- hyper feature "runtime" (*)
2454 |   |   |-- hyper feature "tcp" (*)
2455 |   |   |-- tokio feature "full" (*) 
2456 |   |   |-- tokio feature "rt-multi-thread" (*)
2457 |   |   |-- tokio feature "test-util" <-------
2458 |   |   |   `-- tokio-test v0.4.2 (*)
2459 |   |   `-- tower feature "buffer" (*)
2460 |   `-- tokio feature "signal" (*)
2461 |-- tokio feature "parking_lot"
.....
2473 |-- tokio feature "sync"
2474 |   |-- actix-rt v2.7.0 (*)
2475 |   |-- actix-server v2.1.1 (*)
2476 |   |-- hyper v0.14.18 (*)
2477 |   |-- <internal crate> (*)
2478 |   |-- tokio-stream v0.1.8 (*)
2479 |   |-- tokio-test v0.4.2 (*)
2480 |   |-- tokio-util v0.6.9 (*)
2481 |   |-- tokio-util v0.7.1 (*)
2482 |   |-- tower v0.4.13 (*)
2483 |   |-- tower-test v0.4.0 (*)
2484 |   `-- warp v0.3.2 (*)
2485 |   |-- tokio feature "full" (*)
2486 |   |-- tokio feature "test-util" (*) <-------
2487 |   |-- tower feature "buffer" (*)
2488 |   |-- tower feature "limit" (*)
2489 |   |-- tower feature "ready-cache" (*)
2490 |   `-- reqwest feature "blocking" (*)
2491 |-- tokio feature "test-util" (*) <------- (???)
2492 |-- tokio feature "time"
........
|-- tokio feature "test-util" (*)
|-- tokio feature "time"
|   |-- actix-rt v2.7.0 (*)
|   |-- axum v0.5.13 (*)
(*)
|   |-- leaky-bucket v0.11.1 (*)
|   |-- reqwest v0.11.11 (*)
|   |-- tarpc v0.27.2 (*)
|   |-- tokio-io-timeout v1.2.0 (*)
|   |-- tokio-test v0.4.2 (*)
|   `-- warp v0.3.2 (*)
|   [dev-dependencies]
|   `-- ...
|   |-- hyper feature "runtime" (*)
|   |-- hyper feature "tcp" (*)
|   |-- tokio feature "full" (*)
|   |-- tokio feature "test-util" (*) <----
|   |-- tower feature "limit" (*)
|   |-- tower feature "load" (*)
|   |-- tower feature "timeout" (*)
|   |-- tonic feature "channel" (*)
|   |-- tokio-stream feature "time" (*)
|   `-- tokio-util feature "time" (*)

....

Cross verified the feature tree above, all the test-util occurrences check out, except the one in line 2491. I can't quite tell where the test-util comes from. One explanation could be that it is leaked from the dev-dependencies even with resolver=2.

One option to find would be to fork tokio, delete the test-util feature, use the [patch] section to override tokio with the modified version, and then run cargo build and see what breaks.

2 Likes

You appear to have tower-test dependency which depends on tokio-test which depends on tokio's test-util feature. Please make sure that tower-test is specified in dev-dependencies only.

5 Likes

Problems like this keep happening - why aren't the tools diagnosing it properly?

OK, found the issue. Apart from not using test deps in prod, both the edition/resolver need to be set correctly as well. The resolver part is taken care by just having the rust version >= 1.50

Not the edition part tough. I can repro the issue by switching the edition value(verified by patching the tokio to my debug fork). With edition = 2018, I can see the test deps leaking, 2021 works as expected.

But unlike resolver, edition can't be set at workspace level unfortunately

Thanks all for the help and ideas

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.