Abort large upload

My application uploads large video files (~150 MB each) to a server. But at some unpredictable times, the upload must be canceled because other, real-time critical, tasks must be done instead.

Which Rust technologies, that support abort, can I use to upload large files? Please specify which crates and methods I could use, and if you also have sample code, I would be very happy.

I have been uploading with HTTPS POST before, but if SFTP or any other method would be better I could consider that too, but it must be secure.

You can abort an HTTP POST, but you can also upload in small, explicit chunks, doing more than 1 request, and then you can just stop/pause sending requests when you need to do something else.

(Doing many requests should not be a problem if the size of your payload data divided by the throughput is relatively larger than the latency of establishing the connection. And the latter can be made faster by sending a keep-alive header, etc.)

Thank you paramagnetic, but which Rust crate and function can I use to abort an HTTP POST?

I don't think you need a separate library? You just stop writing the body. (If your favorie HTTP client library insists thaz you must always supply the entire body at once, then you can't use it like that.)

If you haven't used a Rust HTTP client crate before, here is a resource listing them.

How strictly realtime this is? What OS are you using? What hardware?

Generally there is nothing in Rust to do that, because the task is OS-specific. On the OS level you need something to interrupt your network access, or set thread scheduling priorities to pre-empt or pause the upload thread when something more important happens.

If you're using a general OS like Linux, I don't think you have any actual real-time guarantees, but OTOH a simple file upload is cheap and won't prevent other threads or processes from running, especially if you have multiple cores.

4 Likes

If the goal is simply to stop executing the upload at any time, then you can use any async HTTP client (such as reqwest). async code is always cancellable.

5 Likes

Thanks for the input everybody, although my hopes was to get something more concrete. But I did pick up the note that async code is always cancellable, so I stopped looking specifically for a way to cancel an upload, and instead looked for ways to make the upload async, and how to cancel that. So I came up with the code below, and it seems to do what I want. Please comment whether you think this is a good method.

[dependencies]
reqwest = "0.12.7"
tokio = { version = "1.40.0", features = ["rt-multi-thread"] }

fn main() {
    // Create a tokio runtime...
    let rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(1)
        .enable_io()
        .build()
        .unwrap();

    // ... so that we can start the upload in the background
    let join_handle = rt.spawn(async move {
        let result = upload("myfile.mp4").await.unwrap();
        println!("Upload finished.");
        result
    });

    println!("Upload has started in the background, now wait for something important...");
    std::thread::sleep(std::time::Duration::from_secs(5));

    println!("Now something important happened.");
    if join_handle.is_finished() {
        println!("Upload was already finished. Ready for realtime critical tasks.");
    } else {
        println!("Aborting upload...");
        join_handle.abort();
        println!("Upload aborted. Ready for realtime critical tasks.");
    }

    println!("Now we do the real-time critical tasks.")
}

async fn upload(path: &str) -> Result<(), Box<dyn std::error::Error>> {
    let client = reqwest::Client::new();
    let url = "https://www.upload_destination.com";
    let body = std::fs::read(path)?;
    client
        .post(url)
        .body(body)
        .send()
        .await?;
    Ok(())
}

I also noticed that there is a tokio_util::sync::CancellationToken that can be used together with tokio::select! to cancel the upload. But here I did not use that, because that was just more code for the same result. Or is there a reason to use a CancellationToken instead of just join_handle.abort()?

Yes, I know I'm reading the entire file to memory before I start sending it, and that may be a bad idea for very large files. And I know there are ways to avoid that using for example tokio_util::codec::FramedRead and reqwest::Body::wrap_stream. But in my case the files are not that big, and my method has the advantages of simpler code and fewer points where errors can occur.

I haven't used it myself, but my understanding is that CancellationToken is useful when you want to direct a task to do something, or when you want to cancel many things. It's really more of a “1-message broadcast channel” than being fundamentally about cancellation. It's not relevant here.

More importantly than memory use, you're using a blocking operation. This means that join_handle.abort(); can't abort it. That might be acceptable, because it runs on another thread in the tokio runtime, so it won't delay what main() is doing, but in principle you should substitute tokio::fs::read(path).await?.

4 Likes

Note that with the stream feature enabled on reqwest, you can make this fully streaming, and not read the body in before sending:

async fn upload(path: &str) -> Result<(), Box<dyn std::error::Error>> {
    let client = reqwest::Client::new();
    let url = "https://www.upload_destination.com";
    let body = tokio::fs::File::open(path)?;
    client
        .post(url)
        .body(body)
        .send()
        .await?;
    Ok(())
}

Because reqwest provides a conversion from a tokio::fs::File to a Body, this works the way you want it to, and has the advantage that as the file gets bigger, it'll stream the file rather than read the whole file into memory before starting the upload.

3 Likes

Thanks for the additional input! With your code samples it is very easy to use async commands also for reading the file, without the FramedRead and wrap_stream that I thought I had to use for this.

I noticed an interesting difference between body = tokio::fs::read(&path) and body = tokio::fs::File::open(&path). When I use File:open and then abort there will be a partially uploaded file on the server. However, if I use fs::read and then abort the task, there is nothing left of the server.

My server is just a php script that stores whatever comes using file_put_contents($filename, file_get_contents('php://input'));

In my case I prefer to not have any partially written files on the server, so I will use the fs::read method for now (I don't have any problem with memory).

Of course lots of things can be done on the server side to improve things, for example so that I can resume a partially uploaded file, but I don't have the time to dive more into that now.

I'm happy with what I have now, thanks again everybody!