I recently ported a small NodeJS app to Rust. I chose axum as a framework and got it done reasonably quickly. Since this is my first time writing something concurrent and I sometimes find it hard to wrap my head around it I would greatly appreciate any pointers or tips what I could improve. The following code contains the core functionality of the app:
use futures::future;
use reqwest;
use std::{io::Write, path::Path};
use time::OffsetDateTime;
use url::Url;
use zip::{write::FileOptions, CompressionMethod, ZipWriter};
const WHITELIST: [&str; 3] = [
"allpicts.in",
"gettyimages.com",
"youtube.com"
];
#[derive(Clone, Debug)]
pub struct File {
pub url: Url,
pub path: String,
}
#[derive(Debug)]
pub struct DownloadReport {
date: OffsetDateTime,
requested: usize,
downloaded: usize,
damaged: Vec<Report>,
missing: Vec<Report>,
}
#[derive(Debug)]
struct Report {
path: String,
url: Url,
reason: String,
}
impl DownloadReport {
pub fn new(req: usize) -> Self {
Self {
date: OffsetDateTime::now_utc(),
requested: req,
downloaded: 0,
damaged: Vec::new(),
missing: Vec::new(),
}
}
}
pub async fn get_responses(files: &[File]) -> Vec<reqwest::Result<reqwest::Response>> {
// let mut handles = Vec::with_capacity(files.len());
// for file in files {
// let Some(d) = file.url.domain() else { continue };
// if WHITELIST.contains(&d) {
// handles.push(tokio::spawn(reqwest::get(file.url.clone())))
// }
// }
// let mut responses = Vec::with_capacity(handles.len());
// for response in handles {
// responses.push(response.await.unwrap())
// }
// responses
future::join_all(
files
.iter()
.filter(|f| {
if let Some(d) = f.url.domain() {
WHITELIST.contains(&d)
} else {
false
}
})
.map(|file| async { reqwest::get(file.url.as_str()).await }),
)
.await
}
pub async fn download_to_zip(zipfile_path: &Path, files: Vec<File>) {
let mut download_report = DownloadReport::new(files.len());
let responses = get_responses(&files).await;
let zipfile = std::fs::File::create(zipfile_path).unwrap();
let mut zipwriter = ZipWriter::new(zipfile);
let options = FileOptions::default().compression_method(CompressionMethod::Deflated);
for (file, response) in files.into_iter().zip(responses.into_iter()) {
zipwriter.start_file(&file.path, options).unwrap();
match response {
Ok(res) => match res.status().as_u16() {
..=199 | 300.. => download_report.missing.push(Report {
path: file.path,
url: file.url,
reason: format!("[{}] {}", res.status(), res.text().await.unwrap()),
}),
_ => {
let bytes: Vec<_> = res.bytes().await.unwrap().into_iter().collect();
zipwriter.write_all(&bytes).unwrap();
download_report.downloaded += 1;
}
},
Err(e) => download_report.damaged.push(Report {
path: file.path,
url: file.url,
reason: e.to_string(),
}),
}
}
}
#[tokio::main]
async fn main() {
let file_list = vec![
File {
url: Url::parse("http://allpicts.in/wp-content/uploads/2018/03/Natural-Images-HD-1080p-Download-with-Keyhole-Arch-at-Pfeiffer-Beach.jpg").unwrap(),
path: String::from("Directory1/Beach.jpg"),
},
File {
url: Url::parse("http://allpicts.in/wp-content/uploads/2018/03/Natural-Images-HD-1080p-Download-with-Keyhole-Arch-at-Pfeiffer-Beach.jpg").unwrap(),
path: String::from("Directory1/Beack2.jpg")
},
File {
url: Url::parse("http://allpicts.in/wp-content/uploads/2018/03/Natural-Images-HD-1080p-Download-with-Keyhole-Arch-at-Pfeiffer-Beach.jpg").unwrap(),
path: String::from("Beach3.jpg")
},
File {
url: Url::parse("http://allpicts.in/wp-content/uploads/2018/03/Natural-Images-HD-1080p-Download-with-Keyhole-Arch-at-Pfeiffer-Beach.jpg").unwrap(),
path: String::from("Directory2/Beach4.jpg")
}
];
let path = Path::new("./test.zip");
download_to_zip(path, file_list).await;
}
As can be seen this downloads a few images (after checking against a whitelist) and zips them up. In the actual app the zip archive is sent back as the response body. Also, a download report is created which is meant for writing to file but I did not include that here.
I implemented the file downloading in two ways (one is commented out) and am unsure which one makes more sense or if it matters.
Also, the code involving the zip archive contains a lot of file IO which is probably blocking the thread but I'm not sure how much of a problem this is.
I also noticed that in deployment the NodeJS app seems to feature significantly lower response times but I suppose there's numerous possible reasons which can't be inferred from this code snippet. However, I'd be glad for any obvious performance improvements I could make.
EDIT: Bug in code