(Original code is taken from Downloading 100,000 Files Using Async Rust - Pat Shaughnessy )
I have some code to download images concurrently using Rust, however, I'm not sure how to implement 2 things. The first, rate limiting, to avoid 429s. I tried using std::thread::sleep
, however, this does not work as I would expect. (I did not write the original code, and am new to async rust, so I am unsure of the specifics of how and when the code runs.).
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let paths: Vec<String> = read_lines("links.txt")?;
let fetches = futures::stream::iter(
paths.into_iter().map(|path| {
async move {
let a = path.split('/').collect::<Vec<&str>>();
let file_name = a.last().unwrap();
std::thread::sleep(std::time::Duration::from_secs(1));
match reqwest::get(&path).await {
Ok(resp) => {
if resp.status().as_u16() != 200 {
println!("failed to download");
println!("{}", path);
};
match resp.bytes().await {
Ok(bytes) => {
//println!("RESPONSE: {} bytes from {}", (bytes.len()), path);
write(format!("downloads/{}", file_name), bytes).unwrap();
}
Err(_) => println!("ERROR reading {}", path),
}
}
Err(_) => println!("ERROR downloading {}", path),
}
}
})
).buffer_unordered(200).collect::<Vec<()>>();
fetches.await;
Ok(())
}
Secondly, I want to have a list of a list of links which were "rate limited" (returned a 429 status), and print that once all other files are finished downloading. My attempt, and resulting compiler error message are as follows:
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut failed: Vec<String> = Vec::new();
let paths: Vec<String> = read_lines("links.txt")?;
let fetches = futures::stream::iter(
paths.into_iter().map(|path| {
async move {
let a = path.split('/').collect::<Vec<&str>>();
let file_name = a.last().unwrap();
match reqwest::get(&path).await {
Ok(resp) => {
if resp.status().as_u16() != 200 {
println!("failed to download {}", path);
failed.push(path);
return // To avoid downloading the file, which will not contain what we want.
};
match resp.bytes().await {
Ok(bytes) => {
//println!("RESPONSE: {} bytes from {}", (bytes.len()), path);
write(format!("downloads/{}", file_name), bytes).unwrap();
}
Err(_) => println!("ERROR reading {}", path),
}
}
Err(_) => println!("ERROR downloading {}", path),
}
}
})
).buffer_unordered(200).collect::<Vec<()>>();
fetches.await;
Ok(())
}
error[E0507]: cannot move out of `failed`, a captured variable in an `FnMut` closure
--> src/main.rs:28:20
|
24 | let mut failed: Vec<String> = Vec::new();
| ---------- captured outer variable
...
28 | async move {
| ____________________^
29 | | let a = path.split('/').collect::<Vec<&str>>();
30 | | let file_name = a.last().unwrap();
31 | | match reqwest::get(&path).await {
... |
35 | | failed.push(path);
| | ------
| | |
| | move occurs because `failed` has type `Vec<String>`, which does not implement the `Copy` trait
| | move occurs due to use in generator
... |
47 | | }
48 | | }
| |_________^ move out of `failed` occurs here
How can I do this?