Temporary value does not live long enought on web scraper

I’m trying to collect all links contained in a given http address into a vector of &strs, like the following:

extern crate reqwest;
extern crate select;

use select::document::Document;
use select::predicate::Name;

fn extract_links(url: &str) -> Result<Vec<&str>, &str> {
    match reqwest::get(url) {
        Ok(res) => {
            let mut links = vec![];

            Document::from_read(res)
                .unwrap()
                .find(Name("a"))
                .filter_map(|a| a.attr("href"))
                .for_each(|link| links.push(link));

            return Ok(links);
        },
        Err(_) => { return Err("The app was unable to fetch a url.") }
    }
}

This code outputs the following error:

 [rustc]
 borrowed value does not live long enough

 temporary value does not live long enough

 note: consider using a `let` binding to increase its lifetime [E0597]

 * main.rs(12, 13): temporary value does not live long enough
 * main.rs(16, 51): temporary value only lives until here

I’ve noticed that this error only shows up when I try to push the contents of Document::from_read(res). ... .filter_map(|a| a.attr("href")) into links. For example, if I replace line 16 for .for_each(|link| println!("{}", link));, the error is avoided all together.

What am I doing wrong? How could I perform such task?

The first place I’d look is your type signature:

fn extract_links(url: &str) -> Result<Vec<&str>, &str>

So you’re returning (in the success case) a Vec of borrowed strings. But where do those borrowed strings live? I’m not familiar with select, but it seems clear to me that they’re owned by the Document value created by Document::from_read. Ergo, you can’t collect them into links since links lives longer than the Document - you need to make a copy of the string returned by a.attr("href").

I recommend changing Vec<&str> to Vec<String> and change links.push(link) to links.push(link.to_string()). This will clone each of the strings into a String that lives on the heap.

1 Like

Thank you! I can now see what wasn’t right.

The reason I was using &strs instead of Strings is that since reqwest::get() takes a &str as a parameter, it would be nice if I could could return a Vec<&str>, so that I could recursively call extract_links().

Do you have any recommendations on that matter?

Because String implements Deref<Target = str>, you can just pass a reference to String.

i.e.

let x: String = "Hello World".to_string();

fn print_me(x: &str) {
    println!("{}", x);
}

print_me(&x);
1 Like

Thanks!

This crate can do what you want: url-scraper.

1 Like

If you need to crawl all URLs across a site, there’s also the configurable url-crawler.

2 Likes