How to find out where a async program hangs?

I have written an async program which queries data from a REST API periodically and writes this data to a postgres database. The program is written with async-std, surf for http access and sqlx for writing the results to the database. It runs in a loop to get new data from the server.

However, after a while (a few days usually) of continuous execution the program it halts (makes no more progress). When I inspect the network I can see that it has no more tcp connections, not to the REST server nor are there any database connections anymore.

My question is: How can I best find out where the program blocks? If I attach gdb and inspect the threads I can see nothing useful because of the task abstraction on top of the threads: I can only see that all threads wait for a task that is ready to execute (not blocked). I also have no idea on how to effectively deploy printf debugging because the program is highly parallel and it takes ages to reproduce the bug.

A common cause for this behavior is holding a lock across an .await waypoint. Before .awaiting a subroutine, make sure to always std::mem::drop any locks you may be holding (whether from a RefCell, RwLock, Mutex, etc). Async is best viewed as "cooperative multi-tasking", so even in a single-threaded LocalSet context in tokio, locking a RefCell then calling .await would prevent other tasks from progressing if another task tries to lock that RefCell.

To debug this, check your program for every location you call lock (if using mutex), read or write (if using RwLocks), borrow_mut or borrow (if using RefCells). Then, make sure the lock gets dropped before calling .await

Does this also applies to async aware sync primitives? I am using RWLock and Mutex from this module: async_std::sync - Rust

I found no mentioning in those docs about the behaviour you described.

No, the async aware locks should work fine. Additionally, usng an std lock across an .await point is pretty difficult because it will give errors about not being Send.

The topic that @nologik is talking about is covered here.

I would probably go for instrumenting the code with the tracing crate, which is the best async-aware logging crate available. That said, I don't know how well it works with async-std, or if you have to use Tokio.

Additionally you may want to read Async: What is blocking?, which talks about one possible cause.

Isn't this code which is found in my program holding a lock over an await point?

pub async fn token(&self) -> RwLockReadGuard<'_, String>;

let response = self
    .try_request(request, region, &*auth.token().await)
    .await;

Isn't the LockGuard being held while awaiting the request? This is a async-std RwLock and the code compiles fine.

The problems with holding it across an await exist exclusively for the std locks.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.