How to ensure that file end is written and fsynced at drop time as much as possible, in sync or async(tokio) context?

The main reason for doing this is to ensure that the file is written as correctly as possible even in the case of errors such as panic. write_end() and close_file() are called sequentially to close the file in the normal path.
For tokio context, is it sound to get the current runtime handle and then block on?
Why can't impl Drop for Writer<std::fs::File> and impl Drop for AsyncWriter<tokio::fs::File>? Looked up a fairly early issue, but it doesn't seem to explain why it should be prohibited.

use std::io::{Write, Result};
use tokio::io::AsyncWrite;

struct Writer<F: Write> {
    inner: F,
    ended: bool,
}

impl<F: Write> Writer<F> {
    fn write_end(&mut self) -> Result<()> {
        if !self.ended {
            // do write end
            self.ended = true;
        }
        Ok(())
    }
}

impl Writer<std::fs::File> {
    fn close_file(&mut self) -> Result<()> {
        self.inner.sync_all()
    }
}

impl<F: Write> Drop for Writer<F> {
    fn drop(&mut self) {
        self.write_end().unwrap();
    }
}

impl Drop for Writer<std::fs::File> {
    fn drop(&mut self) {
        self.write_end().unwrap();
        self.inner.sync_all().unwrap();
    }
}

struct AsyncWriter<F: AsyncWrite + Unpin> {
    inner: F,
    ended: bool,
}

impl<F: AsyncWrite + Unpin> AsyncWriter<F> {
    async fn write_end(&mut self) -> Result<()> {
        if !self.ended {
            // do write end
            self.ended = true;
        }
        Ok(())
    }
}

impl AsyncWriter<tokio::fs::File> {
    async fn close_file(&mut self) -> Result<()> {
        self.inner.sync_all().await
    }
}

impl<F: AsyncWrite + Unpin> Drop for AsyncWriter<F> {
    fn drop(&mut self) {
        tokio::runtime::Handle::try_current().unwrap().block_on(self.write_end()).unwrap();
    }
}

impl Drop for AsyncWriter<tokio::fs::File> {
    fn drop(&mut self) {
        tokio::runtime::Handle::try_current().unwrap().block_on(async {
            self.write_end().await.unwrap();
            self.inner.sync_all().await.unwrap();
        });
    }
}

playground

Filesystem sync correctness is, unfortunately, highly platform specific. You should look up existing research for your target OS, Rust cannot completely abstract it away.

This is really dependent on what you're trying to do, but one thing to consider is using SQLite instead of raw files. It's cross platform, and handles correctness/atomicity issues well. Of course it comes with the baggage of using sql, but I think it would be worth it if you're looking for the best possible correctness. SQLite even knows when a partial write has occurred (like when the computer shutdown unexpectedly) and recovers to a valid state.

Async drop is very problematic, yes, both running async code at sync drop time and adding an async drop operation to the language. The former for the same reason that any nested async context attempt is a really bad idea, the latter because it's really unclear how that should even work, given things like you need to be able to sync drop a Future, so how does that work if it has an async drop?

If you can't use a library that handles this for you (for example sled or anything in #embedded-database // Lib.rs) my main advice is just to be aware that current tokio fs is mostly just a wrapper around spawn_blocking() of the std fs ops, including file io (in practice it's specialized to guarantee ordering). If the io is local, then you're better off sticking to the sync fs ops both for performance and so you can at least know when the OS has your changes if you crash.

Building up from there can get really complicated, as the others have mentioned, effectively at building a small database level, but reasonably doable if you only need to handle internal panics and not power outages and the like: simply putting sync fs to write an update in the Drop will do fine in many cases, you shouldn't even need an fsync!

The general approach for solving this problem handles crashing, power outages, deadlocks, async weirdness (so long as it's serialized like tokio's fs) and any other kind of nastiness in general, though: structure your file as a journal. That is, append only changes, each starting with how long the change is. When you open the file, read and apply all the changes in order, ignoring the last if there's not enough written. If it gets too long, create a second file, write the updated state as the first entry, then rename over the first file. This has some complexity problems, but they're mostly in terms of being able to synchronize file updates to incoming changes without losing performance, needing to keep the data entirely in memory, and the obvious increase in disk usage. These shouldn't be too big of an issue initially, and they have various incremental fixes as you get bigger, though eventually you will have just built another database system.

1 Like

It already is. Purely append only, and create a new file on every starts, never reuse a file. I'm just finding an "extra" guarantee.

Well, I want. File writing is on the separate task and talk with other parts with channels. Working with AsyncWrite is more verbose than Write, and there is not a simple way to convert between these two. But using sync IO on async context is much more problematic. Maybe it will be good to use system thread, but channel operates are also async... use sync channels? Even for only the drop problem, unwrapping tokio::fs::File to std::fs::File and do sync write end and fsync is the most straightforward and reasonable way, but maybe I have to impl the writer two times for both Write and AsyncWrite... maybe macro can help with it? or AsyncWrite to Write maybe more feasible than the opposite?

There just isn't a simple journal-like file storage solution that is SQL-less, memtable-less, even doesn't with any random queries, stores only timestamp with binaries and reads them only sequentialy, and does SQLite-like file system correctness works.

The best approach may is to write sync IO logics, and every time when writting file do spawn_blocking.

BTW minus the no need of random queries, LMDB or the pure Rust version redb seems to be great.

Well you can use std::fs in async contexts, sync channels and even regular Mutexes also work fine if you're careful: you mostly just need to ensure you don't try to hold a lock across an await, or block "a long time", eg use try_send() instead of send() for a bounded channel.

With direct std::fs you do need to be careful if the file system ends up being a network mount or the like, of course, since now IO can take seconds, but there's lots of fun ways you can get horribly mangled trying to keep a file consistent in those cases.

This means you're no longer guaranteed to have the operations execute in the order you spawn them. A separate blocking thread that reads op instructions from a mpsc channel is what you need, and exactly what tokio::fs and, it sounds like, the OP is already doing.

Which mpsc channel should I use? If I use tokio::sync::mpsc, then how can I recv().await in a blocking thread? Is there a channel that has async send and sync recv?
Edit: both tokio::sync::mpsc and async-channel support blocking recv.

And blocking send for the reply!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.