Keeping a Tokio Server Running in the Face of Panics

kamulos · November 10, 2020, 1:50pm

I have a server that is using tokio and spawns multiple listeners on different ports and handles packet accordingly. Now I want to make sure that the servers are running all the time even if there is a panic because of a programming error.

The easiest way I thought of is having a select! over all the servers in a loop and just reinitializing all of them if the loop is taken. This seems like a safe way of doing things to me, but I don't like that all servers are terminated (for a very short time) when only one crashed.

Even more safe but also with an even bigger interruption would be to select! the servers but let the program terminate after that and have the supervisor restart the whole process.

My preferred way would be to have a loop for each server that restarts only this server, if something happens. I tried with futures::FutureExt. I have a run_inside() function that starts the server and handles everything. Then I tried to catch the panic like that:

pub async fn run(mut self) -> Result<()> {
    self.run_inside().catch_unwind().await
}

tokio::spawn(udp_server.run());

But I get a lot of errors like this:

the type &mut tokio::net::UdpSocket may not be safely transferred across an unwind boundary within impl listener::futures::Future, the trait std::panic::UnwindSafe is not implemented for &mut tokio::net::UdpSocket

I don't know how to handle this correctly. Actually everything should be contained inside the run_inside() function and not cross the unwind boundary. But with async functions of course everything is more complicated.

What would you recommend for me just to ensure all servers are running all the time?

alice · November 10, 2020, 1:54pm

You can silence the warning with the following:

use std::panic::AssertUnwindSafe;

pub async fn run(mut self) -> Result<()> {
    AssertUnwindSafe(self.run_inside()).catch_unwind().await
}

tokio::spawn(udp_server.run());

However please note that a tokio::spawn already has a catch_unwind inbuilt, so unless you have some sort of loop in run to restart it on panics, it doesn't do anything.

kamulos · November 10, 2020, 2:24pm

This works just great!

Is there something to be aware of? Can I introduce unsafe behavior if I do something wrong?

alice · November 10, 2020, 2:32pm

There's no unsafe block, so no. All that can happen is logic bugs.

system · February 8, 2021, 2:32pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Panic in Tokio task does not end the program execution help	5	5046	October 10, 2020
Handling unintended panics in Tokio tasks gracefully help	5	4604	January 25, 2020
Tokio panics during shutdown help	5	1390	June 4, 2023
Panic stops the app when using FuturesUnordered	2	488	July 6, 2020
How to avoid "panic " output in linux when this panic has been catch? help	4	307	January 30, 2024

Keeping a Tokio Server Running in the Face of Panics

Related Topics