HTML tag content change

I'm trying to convert a html input to new html output. Example:

<!DOCTYPE html>
<html lang="en">
<!-- This is a comment -->
<head>
    <meta charset="UTF-8">
    <title>Example App title</title>
</head>
<body>
This is the example app.
</body>
</html>

I want to change the title attribute. So the new output becomes:

<!DOCTYPE html>
<html lang="en">
<!-- This is a comment -->
<head>
    <meta charset="UTF-8">
    <title>The real title</title>
</head>
<body>
This is the example app.
</body>
</html>

Any pointers on which crate I should use or maybe examples?

I came so far with this playground gist: Rust Playground see the fixme on line 17

The 7 nested if statements are a code smell. Are you by chance looking for a templating engine?

No, I can't generate the html. It is what users upload and I need to make some changes to them. The title field is just an example to keep it simple. But if I can change the title I can do the rest to.

Refactored the whole function, now it works fine:

fn find_node_by_tag(handle: &Handle, tag: &str) -> Result<Handle, String> {
    let title = Atom::<LocalNameStaticSet>::from(tag);

    if let NodeData::Element { ref name, .. } = handle.data {
        if name.local == title {
            return Ok(handle.clone());
        }
    }
    handle.children
        .borrow()
        .iter()
        .map(|child| find_node_by_tag(child, tag))
        .filter_map(|result| result.ok())
        .next()
        .ok_or(format!("No node with tag {tag} found"))
}

fn app_html_translate(html: String) -> Result<String, String> {
    let opts = ParseOpts {
        tree_builder: TreeBuilderOpts {
            drop_doctype: true,
            ..Default::default()
        },
        ..Default::default()
    };

    let mut dom: RcDom = parse_document(RcDom::default(), opts).one(html);

    let handle: Handle = find_node_by_tag(&dom.get_document(), "title")?;
    let child_handles = &handle
        .children
        .borrow()
        .iter().cloned()
        .collect::<Vec<_>>();
    for child_handle in child_handles {
        dbg!("Remove from dom: {}", &child_handle);
        dom.remove_from_parent(child_handle);
    }
    dom.append(
        &handle,
        NodeOrText::AppendText(tendril::StrTendril::from_str("New title").unwrap()),
    );

    let mut new_doc = vec![];
    let ser: SerializableHandle = dom.document.into();
    serialize(&mut new_doc, &ser, Default::default()).expect("Error occurred in serializing");
    Ok(String::from_utf8(new_doc).expect("Was not able to create the new output file"))
}

The lol_html crate does this. Something like the example on the crates page but using:

            element_content_handlers: vec![
                element!("title", |el| {
                    el.after("The real title", ContentType::Text);

                    Ok(())
                }),
                text!("title", |t| {
                    // remove existing text
                    t.remove()

                    Ok(())
                })
            ],
1 Like

Works much better. Although it was sligtly wrong.

I rewrote it like this now:

let mut output = vec![];
    let mut rewriter = HtmlRewriter::new(
        Settings {
            element_content_handlers: vec![text!("title", |t| {
                // remove existing text
                t.remove();
                // Add new text
                if !t.last_in_text_node() {
                    t.before("Hello new title", lol_html::html_content::ContentType::Text);
                }
                Ok(())
            })],
            ..Settings::default()
        },
        |c: &[u8]| output.extend_from_slice(c),
    );