I use async file I/O in tokio. At the end of my program, I need to ensure that all data is written out. One use case is tests, where I want to ensure that the output text file's content is exactly as expected. Others are in the application domain.
To my understanding, I have to use
sync_all() on the
File instance for this. However, I'm opening the file in a wrapper function that will actually return a gzip encoder. Thus, my code works on a
Now my question is: what would be a pattern to guarantee that all data is flushed to the underlying file and on Linux, the
sync() syscall is performed so the data is written to the file and my program blocked until the data is ready to be read by the same (use case: test) or other processes (use case: hand to other program)?
You should distinguish the following operations
A) Flushing from application buffers to the OS so that if your program crashes the data will still be considered written. This is what
flush does. After something has been written to the OS it will also be available to other applications.
This is somewhat cheap since it often simply moves data from userspace buffers to kernel buffers while performing the actual writes at some point later in the background, potentially after your program already quit.
If there is no userspace buffer, e.g. when you're writing straight to a
File instead of a
BufWriter<File> then flushing is not needed. In this case every write goes straight to the OS.
B) Asking the OS to dump its in-memory write caches to to physical disks and if applicable checkpoint the filesystem in a way that if the OS crashes or power is lost the data will be there after a reboot. This is what
sync_all does and it can be a slow and expensive operation.
This operation is not required for data to become visible to other applications.
If you don't need crash-resilience/durability you don't have to call
When using the async equivalents you also have to wait until the futures completed.
Can you share your code? If you need to call methods from
File on a trait object you need to either downcast back to
File or create a trait that inherits from
AsyncRead and exposed the methods you need.
Thank you for the explanation. I'm writing through multiple layers. Together with the answer of @user16251, I think I can implement a solution.
Thanks. Are you certain you mean
AsyncRead? I would have assume it would rather be
Actually, the Tokio file does need to be flushed due to its implementation.
Maybe I'm misunderstanding, does a completed write future (for fs::File specifically) not mean the write has been handed off to the OS, is there some additional buffering happening beyond what's needed to hand it off to a threadpool?
I was under the impression that the flush is only needed if one had some non-awaited futures in flight somewhere.
A completed write on a
tokio::fs::File means that it has been handed off to the threadpool to be written, but that does not necessarily mean that the threadpool has written it yet. Calling
flush allows you to wait for that to happen.
It's not possible to change this. The
AsyncWrite trait's signature fundamentally forces us to do it this way.
I miswrote the trait; sorry!
Assuming there are no bugs in the OS implementation of the filesystem, should awaiting a successful
flush on an
AsyncWrite be sufficient for observing the writes by a subsequent read?
Yes, if you flush a
tokio::fs::File, then it is guaranteed that the writes have completed.
sync_all method is mentioned in the original post, and flush does not call
sync_all. However, the
sync_all method is only relevant to making sure that the data survives unplugging the power cable.