Keeping a Tokio Server Running in the Face of Panics

I have a server that is using tokio and spawns multiple listeners on different ports and handles packet accordingly. Now I want to make sure that the servers are running all the time even if there is a panic because of a programming error.

The easiest way I thought of is having a select! over all the servers in a loop and just reinitializing all of them if the loop is taken. This seems like a safe way of doing things to me, but I don't like that all servers are terminated (for a very short time) when only one crashed.

Even more safe but also with an even bigger interruption would be to select! the servers but let the program terminate after that and have the supervisor restart the whole process.

My preferred way would be to have a loop for each server that restarts only this server, if something happens. I tried with futures::FutureExt. I have a run_inside() function that starts the server and handles everything. Then I tried to catch the panic like that:

pub async fn run(mut self) -> Result<()> {
    self.run_inside().catch_unwind().await
}

tokio::spawn(udp_server.run());

But I get a lot of errors like this:

the type &mut tokio::net::UdpSocket may not be safely transferred across an unwind boundary within impl listener::futures::Future, the trait std::panic::UnwindSafe is not implemented for &mut tokio::net::UdpSocket

I don't know how to handle this correctly. Actually everything should be contained inside the run_inside() function and not cross the unwind boundary. But with async functions of course everything is more complicated.

What would you recommend for me just to ensure all servers are running all the time?

You can silence the warning with the following:

use std::panic::AssertUnwindSafe;

pub async fn run(mut self) -> Result<()> {
    AssertUnwindSafe(self.run_inside()).catch_unwind().await
}

tokio::spawn(udp_server.run());

However please note that a tokio::spawn already has a catch_unwind inbuilt, so unless you have some sort of loop in run to restart it on panics, it doesn't do anything.

2 Likes

This works just great!

Is there something to be aware of? Can I introduce unsafe behavior if I do something wrong?

There's no unsafe block, so no. All that can happen is logic bugs.

2 Likes