Program termination

I have recently been thinking a fair bit about program termination, a subject about which I previously knew almost nothing. I thought I would document what I learnt ( or THINK I learnt ... ). For simplicity, I am going to assume we are talking about some kind of server running on Linux, started by systemd, although of course whatever the system is, it mostly works the same ( there could be differences in details ).

#1 is the following, which I only discovered about a day into looking at this.

"When the main thread of a Rust program terminates, the entire program shuts down, even if other threads are still running."

If I knew this from the start, it would have saved me some thinking!

#2 : if you do not handle signals from the operating system, as far as I can tell, when (say) systemd stops your program, it just terminates instantly (pretty much... I don't know to what extent this is entirely true). It does nothing more, all the threads stop instantly, nothing is dropped, no more instructions in your program are executed, it is dead. The operating system will however close files, and flush any outstanding writes. I am not sure about network IO, I guess any open TCP sockets etc,. are closed. The IO is shut down. For most programs I guess this is fine, if this is ok for you, you do not need to worry about termination.

#3 : if this is NOT ok for you, this is where things get a little complicated. In my case, I have a process that has a list of outstanding updates to be written to files (database updates), and it would be nice if these were performed before the program terminates. I was aware of this guide in tokio: Graceful Shutdown | Tokio - An asynchronous Rust runtime

You need to have something that will listen for a signal from the operating system. In the case of Linux systemd, this is the SIGTERM signal. It turns out that (by default) systemd will first send a SIGTERM signal, then wait 90 seconds before sending a SIGKILL (which as I understand it is a stronger hint to stop... ). So we have 90 seconds to do anything that needs doing, although I think the program really ought to stop pretty quickly or ... well, misunderstandings could arise. Lets say within a second or two at most.

Around this point, I got a little confused ( partly perhaps because I had not grasped point #1, or even considered it ). I decided "everything shall be dropped" before the program terminates, and during that process, I can make sure outstanding writes have been performed. So in the data structure of outstanding writes, I added a Drop implementation which would wait until they were complete. This was a mistake! In my design, there can be long running threads (doing long read-only database transaction) which (by design) are permitted to run for a long time, even say 10 minute (although this would perhaps be unusual). There is no way to force an independent thread to terminate ( from within a Rust program ), so in fact waiting for the shared global data to drop doesn't work. I suppose it would work to the extent that the writes would be written, but program termination would be delayed without due cause.

So, in the end I added an appropriate function to wait ONLY for the outstanding writes to be performed, and then simply return from main. The code is here:

Interesting how just a handful of lines of code can take a lot of thought.

After some testing, it appears this does what I want. I hope!

The default action for SIGTERM is immediate termination. You can change the action for almost any signal to be one of: default action (immediate termination for some signals, ignore for others, produce a core dump for yet others), ignore or run a signal handler. The only exceptions are SIGKILL (which always does immediate termination), SIGSTOP (which always stops the process until a SIGCONT) and SIGCONT (which always resumes after a SIGSTOP)


There's a style of program design, called "crash-only software", that's designed around this. You have to handle things like the user sending a SIGKILL to your process, or pulling the plug out of the computer at a bad moment anyway, since these are possible things that can happen, and you don't want to tell the user "welp, a bad thing happened so I deleted all your data. Sorry!".

As a result, you have startup code that deals with a past instance terminating in the middle of doing something important. Termination then becomes no big deal - it's something that you cope with, just as you cope when the user sends garbage to your TCP port instead of valid requests.

With that in place, you don't need to worry about termination; you need a way for the user to know whether or not a given request has been handled yet (so that they can implement their own crash-only software, where they assume that if you didn't respond "yes, request handled", they have to start with the presumption that you dropped the request on the floor), but otherwise, you can go down immediately on termination with no ill effect.

And then you arrange your service manager (be that systemd or something else) such that your users are killed before you get a termination request, and thus either your users shut down cleanly (and could wait for your "yes, request handled" before termination), or they were killed and need to go through crash recovery anyway.

This is not always practical, but when you can implement it (as PostgreSQL and MySQL have, for example), it makes managing your service much easier.


Yes, if I didn't have this termination code in place, another strategy would be not to "lie" to the client about the transaction being complete, and wait until it really is (but not hold up other write transactions from going ahead). But in practical terms I think I quite like this approach, and in any case it has been interesting to see if it can work.

[ Of course if the datacenter was hit by a missile or something, or some other sudden catastrophic failure occurred, then the transaction wouldn't be complete, but in such extreme cases, well, the data might be vaporised anyway! ]

A word of caution - I have had to clean up after a system that told clients that transactions were complete, when in fact they could be lost, and the resulting mess is not at all pretty. Turns out it's really, really hard to handle the case where someone uses an admin console to forcibly terminate all VMs on a system in order to update the host, rather than shutting those VMs down cleanly. Most things were (fortunately) crash-only designs, and just recovered after the VMs came back up, but a few systems did not.

This is where crash-only software comes into its own - because the data would be safe over a crash of the system, a crash-only design like PostgreSQL doesn't actually lose any data in this scenario, since it doesn't report the data as written until it's at least ensured that the startup handler will be able to redo the transaction before it resumes serving data.


Most databases make this configurable (whether to flush/fsync the transaction log before reporting that a transaction is committed), and make the default to do the safe thing, the flush/fsync.

Here is the Postres setting.

The main reason for relaxing this is when replication is used. If the commit is received by some number of replicas, then the local flush/fsync may not be considered necessary for durability. And there are also cases where guaranteed durability is not necessary, such as bulk loads that can be restarted after a crash, or when writing low value data. So there are good reasons to make it configurable.

Edit: And as you probably know, a common mistake in benchmark comparisons is to use different values for this setting when comparing the performance of different databases. It is much slower do to the flush/fsync for every commit of course.


I dimly remember some old versions of windows where if the computer was not shut down cleanly, on restart it would go through a lengthy procedure to recover or check the integrity of the filesystem (which might take say 10 minutes... ). I don't think modern filesystems have this problem though, not sure when this went away, maybe 20 years ago?

Modern filesystems either have a journal to log all transactions that will be done to be able to replay if they partially finished (ext3+, ntfs) or are structured such that every transaction is atomically applied (cow filesystems like btrfs or log structured filesystems like f2fs) I have still seen fsck trigger on Windows a couple months ago though. I don't recall why, but it could have been because I booted Linux while Windows was hybernated because of it's fast startup feature.


Just possibly at some stage I switched to using laptops which have a battery, meaning shutdowns are always clean unless the battery has been removed. Maybe that's it.

And those filesystems in turn rely on the guarantee that the underlying hardware honors ordering barriers or flush commands. There has been hardware that lies about writes being committed, which results in data corruption on power loss or unclean system shutdown.

Having batteries helps, but there still hard hard OS crashes, CPU lockups and similar issues.
In the end if durability is a requirement the entire stack must honor commits. If consistency is desired the entire stack must honor ordering constraints.
If any of them cheats (or is misconfigured) to look better on benchmarks then you lose those properties.


A middle ground is to have an append-only write-ahead log, and to report the transaction complete once it’s been written into the log. The full work of adding the information into the primary datastore can then be deferred until later, even to a recover-on-restart procedure if necessary.

1 Like

I was just thinking about how to program "defensively" on the chance that some part of the low-level stack is cheating or buggy. Waiting a little while before terminating (if writes have happened recently) might help I think.

Not really because write caches can keep things in memory for a long time.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.