Overview
I'm working on a Rust project that involves validating HTML content—both full documents and fragments. My requirements go beyond lenient parsing:
Detect and flag unclosed or extra HTML tags
Identify unresolved template placeholders such as
{{field}}
,{{ user.name }}
, or malformed ones like{{{bad}}}
,{{ incomplete
Avoid relying on parsers that silently fix malformed HTML (like html5ever)
Problem
Crates like html5ever
and markup5ever_rcdom
follow the forgiving HTML5 spec and parse even broken HTML successfully. This makes it impossible to detect structural issues at runtime.
For example, this invalid HTML:
<div class="row template-row">
<div class="col-auto">
<a> <!-- ❌ Unclosed <a> tag -->
</div>
</div>
The above is invalid HTML, yet parsers like html5ever auto-close the <a>
tag, and the document is considered "valid" at the parsing level.
This becomes a problem in scenarios like Angular templates, where unclosed tags can cause runtime or build-time errors. Structural correctness matters more than "valid per HTML5 spec."
What I'm Looking For
I'm searching for a Rust-based approach that supports the following validation features:
- Structural Validation
- Detect unclosed, mismatched, or stray tags
- Ex:
<div><a></div>
or<div><span>
should throw an error
- Placeholder Detection
- Find unresolved or malformed template variables like
{{field}}
,{{
,{{{bad}}}
- Fail-Fast Parsing
- Prefer errors over silent recovery
- Option to opt out of "forgiving mode" if using an HTML5 parser
Questions
- Is there any existing Rust crate that performs strict HTML validation (not just parsing)?
- Can we configure
html5ever
to operate in a strict mode or report unclosed tags? - Is writing a custom tag-matching validator (like a stack-based parser) the only way?
- Would wrapping an external tool like
tidy
from Rust be the most pragmatic solution? - Are there template-aware linting crates for formats like Angular, Handlebars, etc. in Rust?