Async write_all sometimes silently fails

igorsol · December 2, 2021, 5:16pm

I have this async function (using tokio):

async fn write_timestamp(ts: &Timestamp, fp: impl AsRef<Path>) -> std::io::Result<()> {
    // {timestamp: Timestamp(1615829398, 2)}
    println!("Writing {:?} to {:?}", ts, fp.as_ref());
    let mut w = Vec::new();
    write!(
        &mut w,
        "{{timestamp: Timestamp({}, {})}}",
        ts.time, ts.increment
    )?;
    println!("debug 1 {}", w.len());
    //let mut buff = Cursor::new(w);
    let mut dst_file = tokio::fs::File::create(fp).await?;
    println!("debug 2");
    //dst_file.write_buf(&mut buff).await?;
    let written = dst_file.write_all(&w).await?;
    println!("debug 3 {:?}", written);
    Ok(())
}

Usually it works as expected but sometimes it creates a zero-length file. There is no panic or error, console looks like this:

Writing Timestamp { time: 1638463068, increment: 1601 } to "timestamp_file"
debug 1 40
debug 2
debug 3 ()

but the file is empty.
What am I doing wrong?

alice · December 2, 2021, 6:35pm

Try flushing?

igorsol · December 2, 2021, 7:26pm

Thank you, probably you are right.
It is not easy to verify if flushing will fix the issue because I cannot reproduce the issue even without flush. Will keep trying.

On the other hand I don't understand why flush is necessary here. I expected it shoud be done automatically on file close.

alice · December 2, 2021, 8:24pm

The way tokio::fs::File works is by off-loading the operation to the spawn_blocking pool, and flushing will make it wait for the spawn_blocking task to finish. If you immediately check the file contents after writing, then it's possible that the spawn_blocking task has yet to finish (the destructor can't wait for it because it's not async).

Still, even if you don't flush, the operation will continue running and probably finish a few ms later.

geebee22 · December 2, 2021, 8:39pm

The function write_all returns a result of type (), not a length. So "written" is always "()". That doesn't explain why the write is not working though.

igorsol · December 3, 2021, 8:55am

No, I don't check content after writing. In fact the call to this function is the last thing I do in my main.
Is it possible that in rare case my main finishes before spawn_blocking executes write operation? So operation is lost?

geebee22 · December 3, 2021, 9:48am

I am no expert, but I would say yes, that's probably the problem. There is a whole page on graceful shutdown here.

igorsol · December 3, 2021, 11:00am

Thanks for the link. That is an interesting read but I cannot apply any of those tips to my code because my application is very simple:

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    ...
    // doing other things here
    ...
    write_timestamp(
        &timestamp,
        dest_dir.join("timestamp_file"),
    ).await.unwrap();
    Ok(())
}

igorsol · December 3, 2021, 3:37pm

Ok, I verified that this issue is fixed by adding flush call. Now I wonder if this is a bug in tokio::fs.

Below is full source of a project reproducing the issue:

use mongodb::bson::Timestamp;
use std::error::Error;
use std::io::Write;
use std::path::Path;
use tokio::io::AsyncWriteExt;

async fn write_timestamp(ts: &Timestamp, fp: impl AsRef<Path>) -> std::io::Result<()> {
    // {timestamp: Timestamp(1615829398, 2)}
    //println!("Writing {:?} to {:?}", ts, fp.as_ref());
    let mut w = Vec::new();
    write!(
        &mut w,
        "{{timestamp: Timestamp({}, {})}}",
        ts.time, ts.increment
    )?;
    let mut dst_file = tokio::fs::File::create(fp).await?;
    //dst_file.write_all(&w).await?;
    //dst_file.flush().await
    dst_file.write_all(&w).await
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let timestamp = Timestamp {
        time: 1619078328,
        increment: 2021,
    };
    write_timestamp(&timestamp, "timestamp_file").await.unwrap();
    Ok(())
}

Cargo.toml dependencies

[dependencies]
mongodb = "2.0.2"
tokio = { version = "1.14.0", features = ["full"] }

Bash script to test binary:

i=0

ctrl_c() {
    echo
    echo "Ctrl-c captured on iteration $i"
    exit 0
}

trap 'ctrl_c' SIGINT

while true; do
    ((++i))
    target/debug/walltest
    if ! [ -s timestamp_file ]; then
        echo "failure occurred on iteration $i"
        exit 1
    fi
    rm timestamp_file
done

igorsol · December 3, 2021, 4:10pm

The same issue also happens with the example code from the tokio's write_all documentation page

geebee22 · December 3, 2021, 4:44pm

It looks to me as if this is simply a program that can terminate without flushing the file. If main terminates before write_timestamp finishes, the write is not going to be flushed.

igorsol · December 3, 2021, 4:49pm

Not sure I understand what do you mean.
My main awaits for write_timestamp so I think it cannot terminate before write_timestamp finishes.

geebee22 · December 3, 2021, 4:52pm

Well, I guess write_timestamp can return it's result before it drops dst_file? That's what seems to be happening.

igorsol · December 3, 2021, 5:00pm

Right, it does not drop dst_file explicitly. I supposed the file should be dropped automatically at the end of scope. If I am wrong then I need to search an explanation about how this works in async code.

geebee22 · December 3, 2021, 5:08pm

I'd like to stress again I am not an expert (or even at all qualified) here, but I think this gives a hint:

https://rust-lang.github.io/async-book/02_execution/04_executor.html

" Typically, an executor will poll a future once to start off. When Future s indicate that they are ready to make progress by calling wake() , they are placed back onto a queue and poll is called again, repeating until the Future has completed."

It sounds to me as if "future completion" is not quite the same thing as the function finishing.

igorsol · December 3, 2021, 6:45pm

Obviously I'm not an expert too but I think here we have completed future because the main function awaited on it. Does that mean function finished too? Yes, I think so. I just cannot imagine how async function can be unfinished if its future is already completed.

geebee22 · December 3, 2021, 6:48pm

I have read that an async function is implemented as a "state machine". It's variables are stored I think in some kind of structure, which perhaps has methods called by the "executor" (maybe called "poll" or something). When poll returns the "result" value, the structure may still exist, and the variables may not yet have been dropped. Don't take any of this as accurate, it's some kind of impression I have.

Edit: I found more detail here: https://doc.rust-lang.org/std/future/trait.Future.html

( and here: async/.await Primer - Asynchronous Programming in Rust )

igorsol · December 3, 2021, 7:33pm

Ok, then the next question in this situation is: when those variables will be dropped? Those variables can contain some exclusive resources, like mutexes, file handles etc. That would lead to a lot of weird side effects. My understanding is that await protects us from those side effects.

By the way I modified my reproduction code to run it with async-std runtime and there is no problem when I run it without flush call.

krtab · December 3, 2021, 9:39pm

I was exactly confused by the same thing reading your question and I reproduced the behavior your describe on my side. To me the fact that you awaited every future was enough to guarantee that the write actually took place but I went and checked the code and if you look at: tokio/file.rs at ee4b2ede83c661715c054d3cda170994a499c39f · tokio-rs/tokio · GitHub you can see that it seems possible that the future returns only once the bytes have been written to some underlying internal buffer (the return Ready(Ok(n)) line 662), and hence I don't see a guarantee that the spawn_blocking has finished running.

It is also super counter-intuitive to me, but given that @alice seems to be reading the thread, I hope she will clarify to us what exact guarantees File makes.

geebee22 · December 3, 2021, 10:34pm

alice said ( and I didn't absorb it at the time : "the destructor can't wait for it because it's not async" ).

Based on that, the destructor may be running, but the flush is not done at that time but later. Overall, terminating an async program is a pretty tricky business, as there can be unfinished work. The async programs I have don't terminate by themselves a t all!

Edit: I think call to poll_sync_all would help?

Topic		Replies	Views
Writing this cleanly	10	1161	December 18, 2020
Tokio uring and write_all help	3	508	October 29, 2021
Basic Tokio File io problem help	4	760	July 24, 2020
Tokio: Is it possible to do concurrent write on a File when using spawn_blocking help	7	202	November 26, 2024
Equivalent of writeln! for tokio? help	5	2099	March 18, 2022

Async write_all sometimes silently fails

Related topics