Validating markdown

I have an application where users can submit small snippets of markdown to a server. The server can gather a bunch of these and add them to a report. I wanted to perform some quick and simple validation when the user submits markdown, so the server doesn't go boom when it tries to generate the reports.

The de-facto standard markdown processor appears to be pulldown-cmark, so I figured I'd make a dummy processor that just runs though the document on user submission, but doesn't actually output anything. I quickly discovered that it doesn't have an Error type. So I did some googling and found two different, somewhat contradictory, statements.

One person said that markdown doesn't really have "errors" at that level. Parsers/processors will purposely just trudge on and try its best. You can check for errors, but that's more for high level constructs. ("Does this document support heading level 3", "does this link to a valid location", etc).

Another person said, about pulldown-cmark specifically (and I'm paraphrasing): "If you can run the parser over the document without it panicing, then the document is free of errors.".

I find it difficult to believe that pulldown-cmark would be designed to panic on a missing code block terminator, so I'm more inclined to believe the first statement. Is this correct?

(Although the data in question isn't particularly important, I don't want to introduce a trivial denial of service vector into the server).

1 Like

It's the former: Markdown doesn't have errors.
From the CommonMark spec:

Any sequence of characters is a valid CommonMark document.

(and notably "To make matters worse, because nothing in Markdown counts as a “syntax error,” the divergence often isn’t discovered right away.")

pulldown-cmark will give you events for all syntax it parses.
That is the document:

```
fn main

will result in these events:

[src/main.rs:14:9] ev = Start(
    CodeBlock(
        Fenced(
            Borrowed(
                "",
            ),
        ),
    ),
)
[src/main.rs:14:9] ev = Text(
    Borrowed(
        "fn main",
    ),
)
[src/main.rs:14:9] ev = End(
    CodeBlock,
)

(playground link, won't work there as pulldown-cmark is not available).

But it won't error.
Note though that the included HTML outputter (pulldown_cmark::html) will turn this into the valid HTML:

<pre><code>fn main</code></pre>\n
3 Likes

Note however that in this example pulldown-cmark does something like an "error recovery", since there is no "codeblock end" in the original document, but there is one in a parsed events sequence.

In any case, if you find a way to make pulldown-cmark panic, please report it as a bug!

If it never panics, both people cited above were correct :slightly_smiling_face:

1 Like

:see_no_evil_monkey: d'oh! of course. I posted it, but somehow it didn't register for me that it does have the End(CodeBlock) event too.