Patterns for flush/sync in async I/O

Hello.

I use async file I/O in tokio. At the end of my program, I need to ensure that all data is written out. One use case is tests, where I want to ensure that the output text file's content is exactly as expected. Others are in the application domain.

To my understanding, I have to use sync_all() on the File instance for this. However, I'm opening the file in a wrapper function that will actually return a gzip encoder. Thus, my code works on a Pin<Box<dyn AsyncWrite>>.

Now my question is: what would be a pattern to guarantee that all data is flushed to the underlying file and on Linux, the sync() syscall is performed so the data is written to the file and my program blocked until the data is ready to be read by the same (use case: test) or other processes (use case: hand to other program)?

Best wishes,
Manuel

You should distinguish the following operations

A) Flushing from application buffers to the OS so that if your program crashes the data will still be considered written. This is what flush does. After something has been written to the OS it will also be available to other applications.
This is somewhat cheap since it often simply moves data from userspace buffers to kernel buffers while performing the actual writes at some point later in the background, potentially after your program already quit.

If there is no userspace buffer, e.g. when you're writing straight to a File instead of a BufWriter<File> then flushing is not needed. In this case every write goes straight to the OS.

B) Asking the OS to dump its in-memory write caches to to physical disks and if applicable checkpoint the filesystem in a way that if the OS crashes or power is lost the data will be there after a reboot. This is what sync_all does and it can be a slow and expensive operation.
This operation is not required for data to become visible to other applications.

If you don't need crash-resilience/durability you don't have to call sync_all.

When using the async equivalents you also have to wait until the futures completed.

Can you share your code? If you need to call methods from File on a trait object you need to either downcast back to File or create a trait that inherits from AsyncRead and exposed the methods you need.

Thank you for the explanation. I'm writing through multiple layers. Together with the answer of @user16251, I think I can implement a solution.

Thanks. Are you certain you mean AsyncRead? I would have assume it would rather be AsyncWrite?

Actually, the Tokio file does need to be flushed due to its implementation.

1 Like

Maybe I'm misunderstanding, does a completed write future (for fs::File specifically) not mean the write has been handed off to the OS, is there some additional buffering happening beyond what's needed to hand it off to a threadpool?
I was under the impression that the flush is only needed if one had some non-awaited futures in flight somewhere.

A completed write on a tokio::fs::File means that it has been handed off to the threadpool to be written, but that does not necessarily mean that the threadpool has written it yet. Calling flush allows you to wait for that to happen.

It's not possible to change this. The AsyncWrite trait's signature fundamentally forces us to do it this way.

1 Like

I miswrote the trait; sorry!

Assuming there are no bugs in the OS implementation of the filesystem, should awaiting a successful flush on an AsyncWrite be sufficient for observing the writes by a subsequent read?

Yes, if you flush a tokio::fs::File, then it is guaranteed that the writes have completed.

The sync_all method is mentioned in the original post, and flush does not call sync_all. However, the sync_all method is only relevant to making sure that the data survives unplugging the power cable.