The background of this question is not that I am unable to perform the request, which is scraping the contents of a html table into a struct. I am currently using a crate that does this already.
The problem is that rust compiler warns that the crate is using functionality that is deprecated and might be removed in a future version of rust:
warning: the following packages contain code that will be rejected by a future version of Rust: cssparser v0.25.9, selectors v0.21.0
These packages are not explicitly listed as a dependency in Cargo.toml, so they are a dependency of one of the crates that I use. Luckily, this can easily be investigated using cargo tree
:
% cargo tree -i cssparser
cssparser v0.25.9
├── scraper v0.11.0
│ └── table-extract v0.2.2
│ └── yb_stats v0.6.1 (/Users/fritshoogland/code/yb_stats)
And
% cargo tree -i selectors
selectors v0.21.0
└── scraper v0.11.0
└── table-extract v0.2.2
└── yb_stats v0.6.1 (/Users/fritshoogland/code/yb_stats)
Aha! So the cssparser and selectors deprecated crates comes from table-extract!
I am using the latest version of table-extract already, but it hasn't had an update in quite a while. (November 2019).
When I manually update the versions of the dependencies to higher versions, the crate dependencies flip it back. So it seems I cannot force a higher version.
I guess that the logical next move is to see if I can perform the same functionality with another, more recent and updated crate. So that is the actual question I have: how can I scrape the data from a html table in a very easy way to move the data row by row into a predefined struct in a vector?
This is my current code using table-extract:
fn parse_threads(
http_data: String
) -> Vec<Threads> {
let mut threads: Vec<Threads> = Vec::new();
if let Some ( table ) = table_extract::Table::find_first(&http_data) {
let empty_stack_from_table = String::from("^-^");
for row in &table {
let stack_from_table = if row.as_slice().len() == 5 {
&row.as_slice()[4]
} else {
&empty_stack_from_table
};
threads.push(Threads {
thread_name: row.get("Thread name").unwrap_or("<Missing>").to_string(),
cumulative_user_cpu_s: row.get("Cumulative User CPU(s)").unwrap_or("<Missing>").to_string(),
cumulative_kernel_cpu_s: row.get("Cumulative Kernel CPU(s)").unwrap_or("<Missing>").to_string(),
cumulative_iowait_cpu_s: row.get("Cumulative IO-wait(s)").unwrap_or("<Missing>").to_string(),
stack: stack_from_table.to_string()
});
}
}
threads
}
A String containing the HTML is given to the function, which then uses the table_extract function to parse the first table it encounters. This is not ideal, it works in a very basic way, and only parses <tr>
,</tr>
as a row and <td>
,</td>
as fields, and doesn't parse some <tr>
data (<tr data-depth="3" class="level3">
is not found), but it does the job.
So the question is: are there equally simple ways to do the same with more modern crates?