I have a number of forever-running tasks that I'd like to be able to restart on failure. For example, in the following example, I start two "workers". These should run forever, and if they fail, I'd like to be able to restart them.
For now I'm just unwrapping any possible error and using using panic = 'abort' to kill the entire app and have it restarted, but I'm wondering if there's a delicate approach?
Is using futures::future::select_all the right tool for that? I came up with something like this (I'll need to use idx to figure out which worker to restart)
async fn supervisor() {
let w1 = tokio::spawn(async { worker("wrk1").await; });
let w2 = tokio::spawn(async { worker("wrk2").await; });
let mut workers = vec![w1, w2];
loop {
let supervisor = futures::future::select_all(workers);
workers = match supervisor.await {
(Err(_), _idx, mut workers) => {
let restarted = tokio::spawn(async { worker("wrk1").await; });
workers.push(restarted);
workers
},
(Ok(_), _, _) => {
// TODO: should not happen, just return an empty vec for now
vec![]
}
}
}
}
Do you have an example you could give? Coming from Erlang/Elixir, it seems you’d want some supervisor which know which children (tasks) it manages.
In the case of system shutdown you probably want to shut things down in a particular order (stop accepting new web requests, allow any background work to finish, then shutdown the database pool).
Would be nice to see a code example for this pattern.
loop {
let res = tokio::spawn(your_fallible_task()).await;
match res {
Ok(output) => { /* handle successfull exit */ },
Err(err) if err.is_panic() => { /* handle panic in task, e.g. by going around loop to restart task */ },
Err(err) => { /* handle other errors (mainly runtime shutdown) */ },
}
}