Future cannot be sent between threads safely

I'm trying to create a scraper program that uses Tokio.

I have two (I think) relevant functions:

pub async fn get_images_of_all_species(links: Vec<String>, num_images_per_species: usize, image_width: u32, image_height: u32) -> Result<Vec<(RgbImage, String)>, reqwest::Error> {
    let title_selector = Selector::parse("title").unwrap();
    let image_selector = Selector::parse("td.node-main-alt > a > img").unwrap();

    let mut map = Vec::<(RgbImage, String)>::new();
    let mut titles = Vec::<String>::new();
    
    for link in links {
        let title_selector = title_selector.clone();

        let doc = get_text(link).await?;

        let html = Html::parse_document(&doc);
        // get title
        let title_element = html.select(&title_selector).next().unwrap();
        let title_text = title_element
            .text()
            .collect::<Vec<&str>>()
            .join("")
            .split("-")
            .take(1)
            .collect::<String>()
            .trim()
            .replace("Species ", "");

        let num_loops = (num_images_per_species as f64 / 24.0).floor() as usize;
        let remainder = num_images_per_species % 24;
        
        let mut data = Vec::<(RgbImage, String)>::new();

        // get most images
        for i in 0..num_loops {
            let link = link.clone();
            let title_text = title_text.clone();

            let temp_data = tokio::spawn(async move {
                let from = i * 24;
                let doc = get_text(link + &format!("/bgimage?from={}", from)).await.unwrap();

                let images = get_images_from_page(doc, 24, image_width, image_height).await;
                
                images
            });

            let new_imgs = temp_data
                .await
                .unwrap()
                .iter()
                .map(|img| (img.to_owned(), title_text))
                .collect();

            data.extend(new_imgs);
        }
        
    }

    Ok(map)
}

pub async fn get_images_from_page(doc: String, num_images: usize, width: u32, height: u32) -> Vec<RgbImage> {
    let mut images = Vec::<RgbImage>::new();
    let html = Html::parse_document(&doc);
    let image_selector = Selector::parse("td.node-main-alt > a > img").unwrap();
    let mut c = 0;

    let found_images: Vec<_> = html.select(&image_selector).collect();

    for i in 0..found_images.len() {
        let src = found_images[i]
            .value()
            .attrs()
            .collect::<HashMap<&str, &str>>()["src"];

        images.push(get_image(src.to_string(), width, height).await.unwrap());

        c += 1;

        if c > num_images {
            break;
        }
    }

    images
}

I get a lot of errors relating to the temp_data future. The sort of ugly, full compiler diagnosis:

error: future cannot be sent between threads safely
   --> src\bg_scraper\mod.rs:45:42
    |
45  |               let temp_data = tokio::spawn(async move {
    |  __________________________________________^
46  | |                 let from = i * 24;
47  | |                 let doc = get_text(link + &format!("/bgimage?from={}", from)).await.unwrap();
48  | |
...   |
51  | |                 images
52  | |             });
    | |_____________^ future created by async block is not `Send`
    |
    = help: within `tendril::tendril::NonAtomic`, the trait `Sync` is not implemented for `Cell<usize>`
note: future is not `Send` as this value is used across an await
   --> src\bg_scraper\mod.rs:83:62
    |
71  |     let html = Html::parse_document(&doc);
    |         ---- has type `Html` which is not `Send`
...
83  |         images.push(get_image(src.to_string(), width, height).await.unwrap());
    |                                                              ^^^^^^ await occurs here, with `html` maybe used later
...
93  | }
    | - `html` is later dropped here
note: required by a bound in `tokio::spawn`
   --> C:\Users\Salt lick\.cargo\registry\src\github.com-1ecc6299db9ec823\tokio-1.29.0\src\task\spawn.rs:166:21
    |
166 |         T: Future + Send + 'static,
    |                     ^^^^ required by this bound in `spawn`

error: future cannot be sent between threads safely
   --> src\bg_scraper\mod.rs:45:42
    |
45  |               let temp_data = tokio::spawn(async move {
    |  __________________________________________^
46  | |                 let from = i * 24;
47  | |                 let doc = get_text(link + &format!("/bgimage?from={}", from)).await.unwrap();
48  | |
...   |
51  | |                 images
52  | |             });
    | |_____________^ future created by async block is not `Send`
    |
    = help: within `ego_tree::Node<Node>`, the trait `Sync` is not implemented for `Cell<NonZeroUsize>`
note: future is not `Send` as this value is used across an await
   --> src\bg_scraper\mod.rs:83:62
    |
75  |     let found_images: Vec<_> = html.select(&image_selector).collect();
    |         ------------ has type `Vec<ElementRef<'_>>` which is not `Send`
...
83  |         images.push(get_image(src.to_string(), width, height).await.unwrap());
    |                                                              ^^^^^^ await occurs here, with `found_images` maybe used later
...
93  | }
    | - `found_images` is later dropped here
note: required by a bound in `tokio::spawn`
   --> C:\Users\Salt lick\.cargo\registry\src\github.com-1ecc6299db9ec823\tokio-1.29.0\src\task\spawn.rs:166:21
    |
166 |         T: Future + Send + 'static,
    |                     ^^^^ required by this bound in `spawn`

error: future cannot be sent between threads safely
   --> src\bg_scraper\mod.rs:45:42
    |
45  |               let temp_data = tokio::spawn(async move {
    |  __________________________________________^
46  | |                 let from = i * 24;
47  | |                 let doc = get_text(link + &format!("/bgimage?from={}", from)).await.unwrap();
48  | |
...   |
51  | |                 images
52  | |             });
    | |_____________^ future created by async block is not `Send`
    |
    = help: within `ego_tree::Node<Node>`, the trait `Sync` is not implemented for `UnsafeCell<tendril::tendril::Buffer>`
note: future is not `Send` as this value is used across an await
   --> src\bg_scraper\mod.rs:83:62
    |
75  |     let found_images: Vec<_> = html.select(&image_selector).collect();
    |         ------------ has type `Vec<ElementRef<'_>>` which is not `Send`
...
83  |         images.push(get_image(src.to_string(), width, height).await.unwrap());
    |                                                              ^^^^^^ await occurs here, with `found_images` maybe used later
...
93  | }
    | - `found_images` is later dropped here
note: required by a bound in `tokio::spawn`
   --> C:\Users\Salt lick\.cargo\registry\src\github.com-1ecc6299db9ec823\tokio-1.29.0\src\task\spawn.rs:166:21
    |
166 |         T: Future + Send + 'static,
    |                     ^^^^ required by this bound in `spawn`

error: future cannot be sent between threads safely
   --> src\bg_scraper\mod.rs:45:42
    |
45  |               let temp_data = tokio::spawn(async move {
    |  __________________________________________^
46  | |                 let from = i * 24;
47  | |                 let doc = get_text(link + &format!("/bgimage?from={}", from)).await.unwrap();
48  | |
...   |
51  | |                 images
52  | |             });
    | |_____________^ future created by async block is not `Send`
    |
    = help: within `ego_tree::Node<Node>`, the trait `Sync` is not implemented for `UnsafeCell<Option<Option<tendril::tendril::Tendril<tendril::fmt::UTF8>>>>`
note: future is not `Send` as this value is used across an await
   --> src\bg_scraper\mod.rs:83:62
    |
75  |     let found_images: Vec<_> = html.select(&image_selector).collect();
    |         ------------ has type `Vec<ElementRef<'_>>` which is not `Send`
...
83  |         images.push(get_image(src.to_string(), width, height).await.unwrap());
    |                                                              ^^^^^^ await occurs here, with `found_images` maybe used later
...
93  | }
    | - `found_images` is later dropped here
note: required by a bound in `tokio::spawn`
   --> C:\Users\Salt lick\.cargo\registry\src\github.com-1ecc6299db9ec823\tokio-1.29.0\src\task\spawn.rs:166:21
    |
166 |         T: Future + Send + 'static,
    |                     ^^^^ required by this bound in `spawn`

error: future cannot be sent between threads safely
   --> src\bg_scraper\mod.rs:45:42
    |
45  |               let temp_data = tokio::spawn(async move {
    |  __________________________________________^
46  | |                 let from = i * 24;
47  | |                 let doc = get_text(link + &format!("/bgimage?from={}", from)).await.unwrap();
48  | |
...   |
51  | |                 images
52  | |             });
    | |_____________^ future created by async block is not `Send`
    |
    = help: within `ego_tree::Node<Node>`, the trait `Sync` is not implemented for `UnsafeCell<Option<Vec<string_cache::atom::Atom<markup5ever::LocalNameStaticSet>>>>`
note: future is not `Send` as this value is used across an await
   --> src\bg_scraper\mod.rs:83:62
    |
75  |     let found_images: Vec<_> = html.select(&image_selector).collect();
    |         ------------ has type `Vec<ElementRef<'_>>` which is not `Send`
...
83  |         images.push(get_image(src.to_string(), width, height).await.unwrap());
    |                                                              ^^^^^^ await occurs here, with `found_images` maybe used later
...
93  | }
    | - `found_images` is later dropped here
note: required by a bound in `tokio::spawn`
   --> C:\Users\Salt lick\.cargo\registry\src\github.com-1ecc6299db9ec823\tokio-1.29.0\src\task\spawn.rs:166:21
    |
166 |         T: Future + Send + 'static,
    |                     ^^^^ required by this bound in `spawn`

error: future cannot be sent between threads safely
   --> src\bg_scraper\mod.rs:45:42
    |
45  |               let temp_data = tokio::spawn(async move {
    |  __________________________________________^
46  | |                 let from = i * 24;
47  | |                 let doc = get_text(link + &format!("/bgimage?from={}", from)).await.unwrap();
48  | |
...   |
51  | |                 images
52  | |             });
    | |_____________^ future created by async block is not `Send`
    |
    = help: within `ego_tree::Node<Node>`, the trait `Sync` is not implemented for `*mut tendril::fmt::UTF8`
note: future is not `Send` as this value is used across an await
   --> src\bg_scraper\mod.rs:83:62
    |
75  |     let found_images: Vec<_> = html.select(&image_selector).collect();
    |         ------------ has type `Vec<ElementRef<'_>>` which is not `Send`
...
83  |         images.push(get_image(src.to_string(), width, height).await.unwrap());
    |                                                              ^^^^^^ await occurs here, with `found_images` maybe used later
...
93  | }
    | - `found_images` is later dropped here
note: required by a bound in `tokio::spawn`
   --> C:\Users\Salt lick\.cargo\registry\src\github.com-1ecc6299db9ec823\tokio-1.29.0\src\task\spawn.rs:166:21
    |
166 |         T: Future + Send + 'static,
    |                     ^^^^ required by this bound in `spawn`

Which is basically the same error message 100 times. But, from what I can tell, it's caused by:

images.push(get_image(src.to_string(), width, height).await... => await occurs here, with html and found_images maybe used later

I think (keyword: think) that it's just those two variables giving me shit, but I don't know how to actually make it work.

Asynchronous programming seriously isn't my strength, but it would be horribly inefficient to use blocking here... If anyone knows how to fix this, it would be really helpful!! I'll keep trying to debug it in the meantime.

You cut your uses out of the code you posted, so we can't see what libraries you are actually using. That's often important.

In this case, it looks like you are using a library that uses tendril types that are not thread-safe. In order to be able to do this, you will need to either:

  • convince the library that defines Html etc. to use thread-safe tendrils instead (it may have a feature for that, or it may need to be patched), or
  • use the Tokio single-threaded functionality, spawn_local, so your spawned task uses the same thread and the data doesn't need to be thread-safe.

What crate are you using for the html logic? Looks to me like it is not thread-safe and maybe you could replace it with a thread-safe version.

Oh, sorry!! I'm using scraper-rs and image

Would using Strings instead help? I just get the Html object from parsing a String

Passing Strings that you have extracted from the HTML before you spawn the task will work, since String: Send.

I think the easiest solution would be to replace scraper::html::Html with html_parser::Dom. If you can make your program work with Dom that is (I haven't checked if hmtl_parser's interface is similar to that of scraper). The former is not Send while the latter is.

I think I remember looking at html_parser when I first started this project, but decided on scraper instead... but from the looks of things, html_parser lets me convert the file to a JSON, which would probably make my life a LOT easier

The way to read that error message is it tells you where the future you can't spawn is, then what value wasn't thread safe, then the stack of await calls that caused the former to require the latter.

You can actually use not-thread safe libraries in async code, just not across awaits. Here you could presumably parse the HTML with said library, pull out whatever details you care about, then drop the library's type before the await.

Thanks ^-^

I ended up switching parser libraries, and right now I'm working on letting me iterate over the very nested structure, which is a bit of a pain but I think I'm close to having it work at least? But, since the new library lets me store the data as a serde Object type, it should work?

There's a reason the Rust tagline is "fearless concurrency", and not "easy concurrency"!

You can check if the types you want to hold across awaits implement the Send and Sync traits (on docs.rs, they will show up under "Auto Trait Implementations"), which is what determines if the future as a whole can be spawned on the multithreaded runtime, but it's easy to miss that you're using a value that way. In general, though, libraries are either mostly local or thread safe, with explicit exceptions, so it's not too bad.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.