How to write/replace files atomically?

I need to write and replace existing HTML static files on a very frequent basis. These static files are being read by nginx web server running on FreeBSD. There should be no circumstances where a 404 not found error could occur. That means, while the file is being replaced, nginx should either read the old version or the newly replaced version, and never a 404 or file corruption problem.

My problem is similar to as being described in this thread.

What is the best way to implement this using Rust?

1 Like

You should first write the file to some other location on the same disk, then you should use rename to move it. This should be atomic.

Note that the file must not be moved across mount points. E.g. /tmp is often on a different mount point than the rest of the system.

8 Likes

Sounds simple. Are you absolutely sure "rename" is atomic? Any references to read to confirm this?

(The volume of files will be a lot and need to be simultaneously replaced without nginx getting any file not found error)

1 Like

Yes, rename is atomic in the sense that you need it to be, at least on Linux (I have no idea what Windows does). You can see this in the man page.

If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing. However, there will probably be a window in which both oldpath and newpath refer to the file being renamed.

2 Likes

tempfile's persist does that.

5 Likes

What about this package: https://crates.io/crates/fast_rsync ?

No, you should use rename. (The suggested tempfile::persist() uses it under the hood.)

4 Likes

I don't think fast_rsync even provides functionality for writing to files. It's a library for using rsync's protocol.

So, you should use it if you want rsync's protocol. Even if you do, fast_rsync does not implement writing to files, so if you want atomic replacement for that, you will also need to use rename.

If you're using nginx, then I would assume you're on linux. rename is guaranteed to be atomic there, so I don't think there should be any issues?

BTW, rename being atomic is a POSIX guarantee.

3 Likes

Thank you guys! Looks like "rename" is the way to go. :slight_smile: The reason I'm so concerned is because of the difficulty of testing it. I just have to deploy and 'pray' that nginx would never return a 404.

A common practice is to deploy the new version at a separate document root, then update a symlink to point to the new directory. Note that you still need to use rename to change the symlink atomically, see this answer. You may also need to change the nginx configuration to enable following symlinks. Another benefit is that you can quickly change the symlink back to an older version if you need a quick rollback. For example, this article describes this approach. You can also take a look at capistrano that also uses symlinks for atomic deployment.

1 Like

Thanks for all your help! :slight_smile:

If you care (... and you may not! :), things are a lot more complicated than it first appears. While rename is operationally atomic, that doesn't necessarily result in atomicity in the face of system reboot or crash.

From this article:

Similarly, if you encounter a system failure (such as power loss, ENOSPC or an I/O error) while overwriting a file, it can result in the loss of existing data. To avoid this problem, it is common practice (and advisable) to write the updated data to a temporary file, ensure that it is safe on stable storage, then rename the temporary file to the original file name (thus replacing the contents). This ensures an atomic update of the file, so that other readers get one copy of the data or another. The following steps are required to perform this type of update:

  • create a new temp file (on the same file system!)
  • write data to the temp file
  • fsync() the temp file
  • rename the temp file to the appropriate name
  • fsync() the containing directory

One of the exiting outcomes (that I debugged in the past year...) if you do not do this dance can be: the destructive rename succeeds, but the resulting file is zero-length.

See also: clarification regarding the robustness of `persist()` · Issue #110 · Stebalien/tempfile · GitHub

Edit: I ran across this bug in ldconfig of all places. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=18093 ... and the fix: ldconfig: Sync temporary files to disk before renaming them [BZ #20890] · bminor/glibc@999a6da · GitHub

:exploding_head:

5 Likes

Maybe you need failure atomic msynch.

Thanks for your advice! I'm now considering serving the file directly (cached HTML string) instead of writing it to the file system for nginx to pick it up.

You may not need to go down the crash-consistent filesystem I/O rabbit hole: if what you really need is atomically updated data as observed by nginx, destructive rename is a fantastic option (and possibly the only filesystem-level behavior you can rely on).

What you can try to focus on is trying to guarantee a couple things about your deployment operations:

  1. That they are transactional: there's no way to observe an in-progress (or partially applied) deployment.
  2. That they are idempotent: the fix for a partial or failed deployment is to re-deploy.

Using a symlink to atomically flip from old to new is a great approach... on the other hand, it really depends on what the consumers of the data will do if it's wrong or inconsistent -- if each static asset is viewed in isolation, it might be ok to individually update each file.

After learning about all the ways things can go wrong, I wouldn't blame you for wanting to avoid a filesystem entirely! :grin:

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.