How would I parse a html document from a string and get a traversable DOM tree from it in Rust?

I've tried to use html5ever for this but the interface doesn't make it obvious how to pass in a string with html in it and get a DOM tree out of that.

What is a crate I can use to straightforwardly achieve this?

Going off one of their examples (https://github.com/servo/html5ever/blob/master/rcdom/examples/html2html.rs), I think it would look something like this?

use std::default::Default;
use std::io::{self, Write};

use markup5ever_rcdom as rcdom;

use html5ever::driver::ParseOpts;
use html5ever::tendril::TendrilSink;
use html5ever::tree_builder::TreeBuilderOpts;
use html5ever::{parse_document, serialize};
use rcdom::{RcDom, SerializableHandle};

fn main() {
    let string = r#"
        <!DOCTYPE html>
        <html>
            <body>
                <p>Hello, world!</p>
            </body>
        </html> 
    "#;

    let opts = ParseOpts {
        tree_builder: TreeBuilderOpts {
            drop_doctype: true,
            ..Default::default()
        },
        ..Default::default()
    };
    let dom = parse_document(RcDom::default(), opts)
        .from_utf8()
        .read_from(&mut string.as_bytes())
        .unwrap();
}

ahhh ok thank you.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.