First async application

I recently ported a small NodeJS app to Rust. I chose axum as a framework and got it done reasonably quickly. Since this is my first time writing something concurrent and I sometimes find it hard to wrap my head around it I would greatly appreciate any pointers or tips what I could improve. The following code contains the core functionality of the app:

use futures::future;
use reqwest;
use std::{io::Write, path::Path};
use time::OffsetDateTime;
use url::Url;
use zip::{write::FileOptions, CompressionMethod, ZipWriter};

const WHITELIST: [&str; 3] = [
    "allpicts.in",
    "gettyimages.com",
    "youtube.com"
];

#[derive(Clone, Debug)]
pub struct File {
    pub url: Url,
    pub path: String,
}

#[derive(Debug)]
pub struct DownloadReport {
    date: OffsetDateTime,
    requested: usize,
    downloaded: usize,
    damaged: Vec<Report>,
    missing: Vec<Report>,
}

#[derive(Debug)]
struct Report {
    path: String,
    url: Url,
    reason: String,
}

impl DownloadReport {
    pub fn new(req: usize) -> Self {
        Self {
            date: OffsetDateTime::now_utc(),
            requested: req,
            downloaded: 0,
            damaged: Vec::new(),
            missing: Vec::new(),
        }
    }
}

pub async fn get_responses(files: &[File]) -> Vec<reqwest::Result<reqwest::Response>> {
    // let mut handles = Vec::with_capacity(files.len());
    // for file in files {
    //     let Some(d) = file.url.domain() else { continue };
    //     if WHITELIST.contains(&d) {
    //         handles.push(tokio::spawn(reqwest::get(file.url.clone())))
    //     }
    // }

    // let mut responses = Vec::with_capacity(handles.len());
    // for response in handles {
    //     responses.push(response.await.unwrap())
    // }
    // responses

    future::join_all(
        files
            .iter()
            .filter(|f| {
                if let Some(d) = f.url.domain() {
                    WHITELIST.contains(&d)
                } else {
                    false
                }
            })
            .map(|file| async { reqwest::get(file.url.as_str()).await }),
    )
    .await
}

pub async fn download_to_zip(zipfile_path: &Path, files: Vec<File>) {
    let mut download_report = DownloadReport::new(files.len());

    let responses = get_responses(&files).await;
    let zipfile = std::fs::File::create(zipfile_path).unwrap();
    let mut zipwriter = ZipWriter::new(zipfile);
    let options = FileOptions::default().compression_method(CompressionMethod::Deflated);

    for (file, response) in files.into_iter().zip(responses.into_iter()) {
        zipwriter.start_file(&file.path, options).unwrap();
        match response {
            Ok(res) => match res.status().as_u16() {
                ..=199 | 300.. => download_report.missing.push(Report {
                    path: file.path,
                    url: file.url,
                    reason: format!("[{}] {}", res.status(), res.text().await.unwrap()),
                }),
                _ => {
                    let bytes: Vec<_> = res.bytes().await.unwrap().into_iter().collect();
                    zipwriter.write_all(&bytes).unwrap();
                    download_report.downloaded += 1;
                }
            },
            Err(e) => download_report.damaged.push(Report {
                path: file.path,
                url: file.url,
                reason: e.to_string(),
            }),
        }
    }
}

#[tokio::main]
async fn main() {
    let file_list = vec![
            File {
                url: Url::parse("http://allpicts.in/wp-content/uploads/2018/03/Natural-Images-HD-1080p-Download-with-Keyhole-Arch-at-Pfeiffer-Beach.jpg").unwrap(),
                path: String::from("Directory1/Beach.jpg"),

            },
            File {
                url: Url::parse("http://allpicts.in/wp-content/uploads/2018/03/Natural-Images-HD-1080p-Download-with-Keyhole-Arch-at-Pfeiffer-Beach.jpg").unwrap(),
                path: String::from("Directory1/Beack2.jpg")
            },
            File {
                url: Url::parse("http://allpicts.in/wp-content/uploads/2018/03/Natural-Images-HD-1080p-Download-with-Keyhole-Arch-at-Pfeiffer-Beach.jpg").unwrap(),
                path: String::from("Beach3.jpg")
            },
            File {
                url: Url::parse("http://allpicts.in/wp-content/uploads/2018/03/Natural-Images-HD-1080p-Download-with-Keyhole-Arch-at-Pfeiffer-Beach.jpg").unwrap(),
                path: String::from("Directory2/Beach4.jpg")
            }
        ];
    let path = Path::new("./test.zip");
    download_to_zip(path, file_list).await;
}

As can be seen this downloads a few images (after checking against a whitelist) and zips them up. In the actual app the zip archive is sent back as the response body. Also, a download report is created which is meant for writing to file but I did not include that here.

I implemented the file downloading in two ways (one is commented out) and am unsure which one makes more sense or if it matters.
Also, the code involving the zip archive contains a lot of file IO which is probably blocking the thread but I'm not sure how much of a problem this is.
I also noticed that in deployment the NodeJS app seems to feature significantly lower response times but I suppose there's numerous possible reasons which can't be inferred from this code snippet. However, I'd be glad for any obvious performance improvements I could make.

EDIT: Bug in code

Did you edit the code substantially before posting it? It wouldn't compile as-is due to simple mistakes such as the whitelist containing 3 elements instead of the declared 4.

Because of this and similar (potentially more severe/confusing) differences, it's best to post code that you are actually running.

The commented out variant is probably inferior. If your tasks are independent of one another, it's not a wise idea to await in a loop, since that introduces an artificial dependency, effectively making your code serial (regardless of how much physical parallelism there may be in the system). I don't know off the top of my head if future::join_all is unordered, but it's probably not worse.

Blocking I/O within async code is usually an issue. Try and see if the async_compression crate fulfills your requirements instead.

Sorry, for that mistake, I edited it. I did not edit it heavily, I just exchanged the actual URLs for something else because I didn't want to disclose the real ones. I made a test project, only containing what I posted above so it does indeed run apart from the mistake I overlooked.

The docs of future::join_all state that it will switch to FuturesOrdered if the number of futures is large sort of implying that it uses FuturesUnordered by default? I suppose I could look into using that directly but I'm not sure how since the docs are not heavy on details here.

Thanks for pointing that crate out, I'll take a look!

I was saying that using join_all is fine (as opposed to your commented out snippet that awaits multiple times in a loop).

join_all will not use FuturesUnordered because it guarantees to return the items in the original order of iteration. That's all the "ordered" thing is about. Looking at the docs, it says it still races all submitted futures in parallel (that's also what FuturesOrdered does!), which is exactly what you want.

Apologies, I didn't read the docs carefully enough there.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.