I have a handful of threads doing things for a program. All are critical, and none are expected to exit on their own. What's the most correct/safe/ergonomic way to detect if one of them has finished early, and propagate a panic if one of them has panicked?
Things I've considered:
Looping and try_join-ing the join handles in sequence. Would probably work, but the loop strikes me as not the correct answer because of all the polling.
Thread scopes. Initially promising on the panic angle at least, but then I realized the scope only panics when ALL the contained threads have joined.
Async with Tokio. Make each thread a blocking Task, then do a select! or use a JoinSet. Best result I've found yet. Mirrors what Nextest does. But apparently, blocking tasks aren't meant to be equivalent to threads, and should only be used for short-running things.
Currently leaning to the Async with Tokio approach, but I have my doubts. Anyone willing to chip in their two cents?
Not sure but I think you need to make sure your threads don't panic: use catch unwind to catch all panics in each thread, and then communicate that a panic has occurred to all threads and have them cooperatively stop executing.
You can use async without specifically using Tokio’s blocking thread pool. You only need a oneshot channel.
let (tx, rx) = oneshot();
std::thread::spawn(move || {
tx.send(do_actual_work());
});
// then put rx in your `select!`
This will wake up the receiver when the thread panics because tx will be dropped, but if you want to also pass the panic payload to propagate, that is just a matter of adding catch_unwind() to this. (That’s essentially the same as how the standard library puts panics in thread JoinHandles, even!)
You can also do a similar thing without any async and just a single MPSC channel, by sending a value when the thread completes or panics.
let (tx, rx) = std::sync::mpsc::channel();
/* for each thread */ {
let tx = tx.clone();
std::thread::spawn(move || {
tx.send(catch_unwind(|| do_actual_work()));
});
}
I assume you're talking about the panic = 'abort'option. Did look at that, but was thrown off by the loss of stack unwinding in the panicking thread.
Behavior I want from that option is:
stop execution of all threads
unwind panicking thread and print stack trace
exit program
Default behavior skips steps 1 and 3, while abort behavior skips step 2.
This solution still won't magically make the threads stop if do_actual_work never finishes.
The OP mentioned:
All are critical, and none are expected to exit on their own
So you'd still need to make the do_actual_work threads block on/poll some signal is triggered when one of the other threads send a message. If do_actual_work reads from a channel you can enqueue a termination message.
If the main thread exits then the whole program stops too, so it can just wait for the first item in channel.
When the main thread of a Rust program terminates, the entire program shuts down, even if other threads are still running. However, this module provides convenient facilities for automatically waiting for the termination of a thread (i.e., join).
That may be helpful. The OP didn't specify that they were going to exit the program as far as I can tell. It may also not be desirable if the work threads need to run some clean up or persist some data.