I have recently been thinking a fair bit about program termination, a subject about which I previously knew almost nothing. I thought I would document what I learnt ( or THINK I learnt ... ). For simplicity, I am going to assume we are talking about some kind of server running on Linux, started by systemd, although of course whatever the system is, it mostly works the same ( there could be differences in details ).
#1 is the following, which I only discovered about a day into looking at this.
If I knew this from the start, it would have saved me some thinking!
#2 : if you do not handle signals from the operating system, as far as I can tell, when (say) systemd stops your program, it just terminates instantly (pretty much... I don't know to what extent this is entirely true). It does nothing more, all the threads stop instantly, nothing is dropped, no more instructions in your program are executed, it is dead. The operating system will however close files, and flush any outstanding writes. I am not sure about network IO, I guess any open TCP sockets etc,. are closed. The IO is shut down. For most programs I guess this is fine, if this is ok for you, you do not need to worry about termination.
#3 : if this is NOT ok for you, this is where things get a little complicated. In my case, I have a process that has a list of outstanding updates to be written to files (database updates), and it would be nice if these were performed before the program terminates. I was aware of this guide in tokio: Graceful Shutdown | Tokio - An asynchronous Rust runtime
You need to have something that will listen for a signal from the operating system. In the case of Linux systemd, this is the SIGTERM signal. It turns out that (by default) systemd will first send a SIGTERM signal, then wait 90 seconds before sending a SIGKILL (which as I understand it is a stronger hint to stop... ). So we have 90 seconds to do anything that needs doing, although I think the program really ought to stop pretty quickly or ... well, misunderstandings could arise. Lets say within a second or two at most.
Around this point, I got a little confused ( partly perhaps because I had not grasped point #1, or even considered it ). I decided "everything shall be dropped" before the program terminates, and during that process, I can make sure outstanding writes have been performed. So in the data structure of outstanding writes, I added a Drop implementation which would wait until they were complete. This was a mistake! In my design, there can be long running threads (doing long read-only database transaction) which (by design) are permitted to run for a long time, even say 10 minute (although this would perhaps be unusual). There is no way to force an independent thread to terminate ( from within a Rust program ), so in fact waiting for the shared global data to drop doesn't work. I suppose it would work to the extent that the writes would be written, but program termination would be delayed without due cause.
So, in the end I added an appropriate function to wait ONLY for the outstanding writes to be performed, and then simply return from main. The code is here:
Interesting how just a handful of lines of code can take a lot of thought.
After some testing, it appears this does what I want. I hope!