Up until now I've been using fs::write(TMP, data); fs::rename(TMP, DEST); to save data for my app; but I recently learned that that the rename is allowed to complete before the data is fully written, which could lead to the resulting file containing only part of the data.
I understand the recommendation is to use File::sync_all before closing the temporary file, but it's also my understanding that File::sync_all is a request to the kernel that data be written as soon as possible, and this is where I wonder, is there a middle ground where I can guarantee atomicity (that the file is in either the old or new state) but let the kernel decide how long it takes (e.g. it would be fine if it took a few minutes to commit to disk).
For my purposes this runs on a Linux server, although if there's a cross-platform solution that would be preferred.
Optimistically, I would imagine this would make more efficient use of disk spin-ups (where more changes can be buffered before spinning up the disk) and avoid interfering with the latency of other higher priority IO on the same machine.
It sounds like there isn't a trivial way to achieve such a thing though, so I'm content to accept that adding the File::sync_all is probably the best solution.
From those I concluded that, in the best case, the behavior is that the newly created file is committed before the rename, but that the developers of the filesystem identify this pattern as a bug in the application they are begrudgingly working around, or that in the worst case, as stated in the original post, the data of the newly created file is not guaranteed to be committed before the rename and you can end up with a partial file after a power-loss or other system crash.
You typically only need File::sync_data() here, though the practical difference is usually going to be zero. sync_data() does enough to ensure that you can read back all the data that was written, but doesn't bother updating any metadata about the file that isn't required to read the data. This usually means that if the file's length has changed then it's equivalent to sync_all(), and if you're creating a new file then the length will have changed (unless you wrote zero bytes)... but it's only actually required to use sync_all() if you care about other metadata of the file (like timestamps or permissions).
But, yeah, your general conclusion is correct: you can't safely avoid explicitly syncing the data to disk without relying on specific knowledge of which filesystem is being used and how it's configured, even just on Linux, let alone cross-platform. No platform's file APIs that I know of make any promises about whether the order of operations will be preserved in the case of a system failure.