I'm trying to get a list of links from a web page. Problem is, when I select the herf attribute from tags, I get relative URLs. I want to join that relative URL to the URL of the page I fetched to get the absolute version. But I cannot return the &str from the parsed url::Url, because that variable is local.
use error_chain::quick_main;
use select::{document::Document, predicate::Name};
use url::Url;
mod errors {
error_chain::error_chain! {
foreign_links {
ParseError(url::ParseError);
ReqError(reqwest::Error);
IoError(std::io::Error);
}
}
}
fn run() -> errors::Result<()> {
let base = Url::parse("https://www.rust-lang.org")?;
let resp = reqwest::get(base.as_str())?;
Document::from_read(resp)?
.find(Name("a"))
.filter_map(|n| n.attr("href"))
.map(|n| match base.join(n) {
Ok(u) => u.as_str(),
Err(_) => n,
})
.for_each(|url| println!("{}", url));
Ok(())
}
quick_main!(run);
When I try to build this, I get the following:
$cargo run
Compiling getlinks v0.1.0 (/me/code/rust/getlinks)
error[E0515]: cannot return value referencing local variable `u`
--> src/main.rs:22:22
|
22 | Ok(u) => u.as_str(),
| -^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `u` is borrowed here
error: aborting due to previous error
For more information about this error, try `rustc --explain E0515`.
error: Could not compile `getlinks`.
To learn more, run the command again with --verbose.
I could clone the string, but that just seems like a bad idea. I could also probably get rid of the map and move the match into the for_each, and use println! twice, but map just seems more intuitive.
Edit: I'm aware there's probably a bug where I'll get something like https://www.rust-lang.org/https://www.other-url.example.com/ if I try joining two absolute URLs. I'll figure that out next.
Edit 2: Nope, actually url::Url takes care of that for you.
println!("{}", Url::parse("https://www.rust-lang.org/").unwrap().join("https://www.example.com/resource.html").unwrap());
prints https://www.example.com/resource.html
.