Tokio's spawn tasks and join handles

Assume there is a spawn task with a loop that keeps on invoking tokio::time::sleep(…).await and nothing else. The documentation says that dropping the join handle does not abort the task etc. Is it then guaranteed that the task will continue to be pulled and be progressing in general until the runtime is shut down, or is there some mechanism that disposes of or heavily deprioritizes detached, marginally active tasks? I have a background job that is supposed to stay alive, and it seems to die after some hours or days, but it also seems to help to keep the join handle around.

As per the docs:

A JoinHandle detaches the associated task when it is dropped, which means that there is no longer any handle to the task, and no way to join on it.

[…]

If a JoinHandle is dropped, then the task continues running in the background and its return value is lost.

That is: a task continues to run to completion even after its JoinHandle has been dropped. Keeping the handle alive does not keep the task alive.

If you've got a task that is exiting unexpectedly, I'd look at what that task is doing; Tokio isn't terminating it for you.

Yeah, this is my understanding too. I never saw it exiting; it just stops making progress, it feels like. In addition to the sleep(…) I mentioned, I also have tracing::info!(…), and the logs just stop appearing after some day of running (and the actual work stops being done). Wanted to ask anyway to ensure there was nothing behind the scenes in the runtime. Maybe it was a coincidence that keeping the join handle was helping.

Tokio does not dispose of background tasks in any way.

If you have a task that sometimes appears to stop running, then you probably have a different task that is blocking the thread. You may be able to use tokio-console to debug this by looking for tasks whose "busy" number is increasing but the "poll" number is not increasing.

5 Likes

Thank you! Was not aware of that tool.

One other thought: In my scenario, there are at least two threads with the multi-thread runtime used. I see that everything else is working fine. To elaborate, it is an Axum server, serving requests just fine at all times. Is then your explanation about having “a different task blocking the thread” plausible still?

@alice, sorry for bothering. I am curious to know your opinion when you have time.

In my opinion it is not possible to know the answer without using some sort of diagnostic tool. A task started by Axum could be misbehaving even if you're not currently seeing any visible symptoms, and it is at least possible there are tasks running you're not aware of. So we're only guessing without looking at what's happening internally. Why not run the tokio console and see whether there are tasks blocking?

My opinion is that you need to try using the console and see what it says.

I agree. It is just speculations at this point. The difficulty is that I do not have any reliable way to reproduce it. It can work fine for two weeks straight, or it might start to fail on the second day. I will then plug in console-subscriber and start waiting.

In that case, you may also want to try this tool:

1 Like

That seems like a really useful tool. Would be a good idea to merge it into tokio as an optional component (perhaps behind a feature flag as well). Right now it seems really hard to find.

I think I found it. I simplified my example a bit too much and left out one important detail: the task is actually making an HTTP request via hyper. Turns out hyper does not have any timeout logic built in the client, and the behavior I was observing was simply due to occasional network outage, which was leading for the other side of the socket disappearing and the TCP connection getting stuck with no resolution. The lesson learnt is that one has wrap requests in something like tokio::time::timeout.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.