I need to write and replace existing HTML static files on a very frequent basis. These static files are being read by nginx web server running on FreeBSD. There should be no circumstances where a 404 not found error could occur. That means, while the file is being replaced, nginx should either read the old version or the newly replaced version, and never a 404 or file corruption problem.
My problem is similar to as being described in this thread.
What is the best way to implement this using Rust?
Yes, rename is atomic in the sense that you need it to be, at least on Linux (I have no idea what Windows does). You can see this in the man page.
If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing. However, there will probably be a window in which both oldpath and newpath refer to the file being renamed.
I don't think fast_rsync even provides functionality for writing to files. It's a library for using rsync's protocol.
So, you should use it if you want rsync's protocol. Even if you do, fast_rsync does not implement writing to files, so if you want atomic replacement for that, you will also need to use rename.
If you're using nginx, then I would assume you're on linux. rename is guaranteed to be atomic there, so I don't think there should be any issues?
Thank you guys! Looks like "rename" is the way to go. The reason I'm so concerned is because of the difficulty of testing it. I just have to deploy and 'pray' that nginx would never return a 404.
A common practice is to deploy the new version at a separate document root, then update a symlink to point to the new directory. Note that you still need to use rename to change the symlink atomically, see this answer. You may also need to change the nginx configuration to enable following symlinks. Another benefit is that you can quickly change the symlink back to an older version if you need a quick rollback. For example, this article describes this approach. You can also take a look at capistrano that also uses symlinks for atomic deployment.
If you care (... and you may not! :), things are a lot more complicated than it first appears. While rename is operationally atomic, that doesn't necessarily result in atomicity in the face of system reboot or crash.
From this article:
Similarly, if you encounter a system failure (such as power loss, ENOSPC or an I/O error) while overwriting a file, it can result in the loss of existing data. To avoid this problem, it is common practice (and advisable) to write the updated data to a temporary file, ensure that it is safe on stable storage, then rename the temporary file to the original file name (thus replacing the contents). This ensures an atomic update of the file, so that other readers get one copy of the data or another. The following steps are required to perform this type of update:
create a new temp file (on the same file system!)
write data to the temp file
fsync() the temp file
rename the temp file to the appropriate name
fsync() the containing directory
One of the exiting outcomes (that I debugged in the past year...) if you do not do this dance can be: the destructive rename succeeds, but the resulting file is zero-length.
Thanks for your advice! I'm now considering serving the file directly (cached HTML string) instead of writing it to the file system for nginx to pick it up.
You may not need to go down the crash-consistent filesystem I/O rabbit hole: if what you really need is atomically updated data as observed by nginx, destructive rename is a fantastic option (and possibly the only filesystem-level behavior you can rely on).
What you can try to focus on is trying to guarantee a couple things about your deployment operations:
That they are transactional: there's no way to observe an in-progress (or partially applied) deployment.
That they are idempotent: the fix for a partial or failed deployment is to re-deploy.
Using a symlink to atomically flip from old to new is a great approach... on the other hand, it really depends on what the consumers of the data will do if it's wrong or inconsistent -- if each static asset is viewed in isolation, it might be ok to individually update each file.
After learning about all the ways things can go wrong, I wouldn't blame you for wanting to avoid a filesystem entirely!