File download with reqwest (blocking) fails with file bigger than 10mbytes

I am trying to download files using request not in async mode.
I can request and download the file, unless is bigger than about 10 megabytes.

How could I solve the problem?

  • Should I use async?
  • Can I stream bytes directly into a file?

Following is my current code:

use bytes::{Buf, Bytes};
use reqwest::{blocking::Client, blocking::Response, Url};

fn main  {
    // Use client builder to configure a custom client
    let http_client_builder = Client::builder();
    let http_client = http_client_builder
            .gzip(true)
            .deflate(true)
            .build().unwrap();

    // Request file
    let http_response = http_client.get("https://ftp.mypage.org/remote_file.gz").send();

    if http_response.is_err() {
            panic!("Error while retrieving data: {:#?}", http_response.err());
        }
    else {            
           let body = http_response.unwrap().bytes().unwrap();
           // Get the content of the response
           std::fs::write("./downloaded_file.gz" , &body).expect("Reference proteome download failed for {file_name}");
        }

}

Please note that I am using this way to download as it was suggested in this thread

Thanks a lot for your help.

Can you elaborate on what you are seeing? Is it that you get an Err(reqwest::Error) when sending the request, or does the request succeed but the data you get back is truncated?

The request succeeds. It simply panics at the instruction

let http_response = http_response.unwrap().bytes().unwrap();

I am guessing this is happening because it tries to read the whole responce in memory...but not sure really.
For reference the file requested is about 30 MBytes, which not exactly big.

Can you share the panic message? That normally indicates what went wrong.

I can guarantee it's not a size issue with the reqwest client. I've downloaded much larger files using reqwest before without any problems. That indicates the issue might be somewhere else, for example maybe the upstream server has some sort of cutoff where it'll limit response sizes to avoid being overloaded by a malicious actor.

It seems to be a timeout error (assuming that is the way to call it).

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: reqwest::Error { kind: Decode, source: TimedOut }', src/uniprot.rs:260:55
stack backtrace:
   0: rust_begin_unwind
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/result.rs:1651:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/result.rs:1076:23
   4: data_download::uniprot::fetch_canonical_set
             at ./src/uniprot.rs:260:24
   5: data_download::retrieve_canonical_proteomes
             at ./src/main.rs:174:29
   6: data_download::main
             at ./src/main.rs:430:9
   7: core::ops::function::FnOnce::call_once
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

If you create the Client via the reqwest::blocking::ClientBuilder, you should be able to configure the timeout to a value that works for you.

The docs even include an example for doing this: https://docs.rs/reqwest/latest/reqwest/blocking/struct.ClientBuilder.html

use std::time::Duration;

let client = reqwest::blocking::Client::builder()
    .timeout(Duration::from_secs(10))
    .build()?;
2 Likes

It is working!
Thanks a lot, I should have read better the error and the reqwest documentation.

Thanks again.

1 Like

I noticed that the async version defaults to no timeout, which seems a little inconsistent. I am not sure why the blocking/non-blocking defaults should be different.

Likely because in the async world, timeouts can be added at any layer of the application, but when blocking, timeouts must be implemented by the blocking operation itself.

1 Like

Perhaps, but I don't entirely see the logic of timing out by default. It seems surprising, especially as the default value is really completely random and the only logical default is "None". Maybe it is some kind of historical accident and cannot now be changed.