Multi-thread shutdown, best practices

So, main thread starts another thread and gets back a join handle. If the main thread has its window closed, I want to have other threads quit their processing, close out properly, and join.

It would be convenient if a thread could test whether something is blocked waiting for it to exit and join. But that doesn't seem to be a feature.

It would be useful if crossbeam_channel had a way to test from the send end whether the receive end had been closed. I can send a dummy message, which will fail if the channel is closed, but that's not something to do frequently. Any way to do that better?

Failing that, a global "everybody shut down" flag will work. Any standard way to do that?

I admit, doing mostly async I haven't really dug into this situation much, but I think a global flag is the way to go.

It really depends on whether you need to do cleanup in your threads. If so, you need a cooperative way. If you just want to shut down, you could just not join and call std::process::exit.

When you want to coordinate, you need your threads to check a flag regularly. I would suggest just make it a static SHUTDOWN: AtomicBool = AtomicBool::new(false);.

You can check this bool with Ordering::Relaxed as long as it just tells threads to shut down and you don't read it from on the main thread. Ok, that's confusing. What I mean is don't do shenanigans like unsetting the bool in a thread when it's done its cleanup and then checking that in the main thread to synchronize things with the thread shutting down. Keep it a one way signal.

1 Like

Besides global flag, you can send a closure to the thread / each thread in the threadpool, to command them to exit.

I have used the approach described by @najamelan successfully a few times. Minimal code required and works well.

This is generally one of the things that async/await make a lot easier.

IIRC, if the main thread exits, all other threads are killed. In conclusion, for a cooperative approach, you'd need a counter to signal the main thread when it is allowed to end the process in addition to the atomic boolean.

I assume the OP uses thread::JoinHandle::join to keep the main thread alive until the other threads are done.

@alice well, yield points allow you to drop the task, but if they need async clean up, you need cooperative cancellation as well.

Sure, but this is also easier with async with the right combination of select

Don't use statics for this. The devil uses statics to seduce us, but we must resist the temptation. Use an RAII pattern whereby the dropping of a structure (or, the final A/rc reference towards some T) causes a send through an MPSC channel. The single consumer then is selected! with the future representing the running program. If you have multiple structures, then the singular consumer should wait until all the drops occurred. This has worked quite well in my async programs (using tokio threaded rt)

4 Likes

I ended up creating an Arc and passing that into the closure of a spawn. Then I could set it from the main thead, to get the other thread to shut down. No global needed.

I don't think there is a specific best practice. That's a shame, as doing this in a modular way in large scale software is hard.

My advice would be:

For small programs, do whatever works. If Arc<AtomicBool> is enough, than use it.

For large programs, think hard about what's the best approach to use, and be ready to refine it later when you find edge cases that break.

Some specific gotchas (for large scale things):

  • Don't use AtomicBool as a cancellation signal. The problem with AtomicBool is that it's not a selectable event. If somewhere your are doing a blocking operation (like a .recv() on a channel), you won't be able to unblock it when the cancel signal is sent.
  • Don't use an Exit message over the channel as a cancellation signal. The problem with that is you'll have dedicated if message == Exit code paths everywhere. As those code-paths are executed rarely, they'll be buggy (the same effect that makes incorrect exception handling the culprit of most outages).
  • Do use "channel is closed" as the cancellation signal. Most of the code will end up looking like for event in channel, without special handling for termination. It will also automatically be crash only software -- as channels are closed when dropped, you'll get automatic cancellation if something somewhere panics or Errs via ?.
  • Do use crossbeam_channel::Receiver<Void> where Void is defined as enum Void {} as a cancellation signal. This is glorified Arc<AtomicBool>. Like bool, it's Clone, Sync, and allows for loading, storing. Unlike bool, it is automatically signaled in drop, and can be used with select !
  • (tongue in cheek) Do not use crossbeam_channel for cancellation :slight_smile: The problem with channels is that, while they have select!, it works only with channels, it is closed-world. If you have some other source of events, it, in the general case, would require to sacrifice a thread to convert to channel API. Note sure what's the right "universally selectable API" for blocking programs -- it sort of feels that the Future is, but using futures in blocking programs feels like it, de-facto, pulls to much of the async ecosystem just to have select!.
  • Be aware of the leaked threads problem. Rust program exits when the main thread exits. If there are any other threads running, they are terminated abruptly without running destructors. You need to joint the threads manually. In general, cancellation is a two step protocol: first you request the cancellation, and then you wait until the work is actually wrapped up. Rust makes it easy to forget the second part.

In general, I found authoring robust concurrent software hard in Rust, exactly because there's an abundance of low-level tools, but a lack of well-understood patterns. It so happened that this week-end I tried to codify the patterns I use in this library:

I don't endorse it (I still don't know how to write concurrent software in Rust), but it might be interesting as a representative of struggles one faces :slight_smile:

2 Likes

I use crossbeam_channel, and at one point I found that there's no way for the sender to test whether the receiver has closed without sending something. Some channel crates have a function for that, but crossbeam does not. So I have to send a dummy message as a probe.

Finding out such things the hard way is annoying. All the channel-like systems should have the full set of test functions. (Could be worse. Writing to a closed channel in Go causes a panic, because the Go guys like UNIX pipe semantics.)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.