How to find out where a async program hangs?

rippz · January 21, 2021, 9:34am

I have written an async program which queries data from a REST API periodically and writes this data to a postgres database. The program is written with async-std, surf for http access and sqlx for writing the results to the database. It runs in a loop to get new data from the server.

However, after a while (a few days usually) of continuous execution the program it halts (makes no more progress). When I inspect the network I can see that it has no more tcp connections, not to the REST server nor are there any database connections anymore.

My question is: How can I best find out where the program blocks? If I attach gdb and inspect the threads I can see nothing useful because of the task abstraction on top of the threads: I can only see that all threads wait for a task that is ready to execute (not blocked). I also have no idea on how to effectively deploy printf debugging because the program is highly parallel and it takes ages to reproduce the bug.

nologik · January 21, 2021, 10:31am

A common cause for this behavior is holding a lock across an .await waypoint. Before .awaiting a subroutine, make sure to always std::mem::drop any locks you may be holding (whether from a RefCell, RwLock, Mutex, etc). Async is best viewed as "cooperative multi-tasking", so even in a single-threaded LocalSet context in tokio, locking a RefCell then calling .await would prevent other tasks from progressing if another task tries to lock that RefCell.

To debug this, check your program for every location you call lock (if using mutex), read or write (if using RwLocks), borrow_mut or borrow (if using RefCells). Then, make sure the lock gets dropped before calling .await

rippz · January 21, 2021, 10:57am

Does this also applies to async aware sync primitives? I am using RWLock and Mutex from this module: async_std::sync - Rust

I found no mentioning in those docs about the behaviour you described.

alice · January 21, 2021, 11:01am

No, the async aware locks should work fine. Additionally, usng an std lock across an .await point is pretty difficult because it will give errors about not being Send.

The topic that @nologik is talking about is covered here.

I would probably go for instrumenting the code with the tracing crate, which is the best async-aware logging crate available. That said, I don't know how well it works with async-std, or if you have to use Tokio.

alice · January 21, 2021, 11:07am

Additionally you may want to read Async: What is blocking?, which talks about one possible cause.

rippz · January 23, 2021, 8:59am

Isn't this code which is found in my program holding a lock over an await point?

pub async fn token(&self) -> RwLockReadGuard<'_, String>;

let response = self
    .try_request(request, region, &*auth.token().await)
    .await;

Isn't the LockGuard being held while awaiting the request? This is a async-std RwLock and the code compiles fine.

alice · January 23, 2021, 10:33am

The problems with holding it across an await exist exclusively for the std locks.

system · April 23, 2021, 10:34am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Tokio timeout seems not working as the thread hangs up help	3	968	October 20, 2022
Tokio deadlock detection? help	8	2775	December 4, 2022
Idewave-cli: my pet project code review	26	971	December 29, 2022
What exactly is async? help	5	316	January 16, 2024
Sync mutex in async program help	11	3629	January 17, 2022

How to find out where a async program hangs?

Related Topics