Extensible markdown parser?

I have a project (ok, yet another blog platform, sorry) where I want to convert markdown to html, but with some extensions of my own. The extensions would be some specific link shortcuts, some support for nice figures, converting headers to sections with headers, smart quotes outputing the <q> tag rather than specific delimiters, and maybe some more stuff.

I've mainly found two rust crates for markdown, and both seems very nice in general but not quite suitable for my use case.

  • pulldown-cmark
    + Seems very fast.
    + Seems to be commonly used in rust projects.
    - Anything not implemented in the default Event iterator is lost.
    - The api for "fixing missing links" is a mut ref to a MutFn, so it can't be a an ordinary function and I can't have my own Parser-constructing function.
  • comrak
    + Gives access to an AST.
    - Closely follows a c implementation, so not very "rusty" in the api (which includes the AST).
    - The api for "fixing missing links" looks very much like in pulldown-cmark.
  • Writing my own parser with nom
    + All the power
    - All the work. :slight_smile:

Is there other alternatives that may be more friendly to local extensions?

Some examples of my extensions:

Sections

I'd like to wrap each header and the content it applies to in a <section> element. For example this input:

## One {#oneid}

Text

### Subheading

More text

## Other

... should give this output:

<section id="oneid"><h2>One</h2>
<p>Text</p>
<section><h3>Subheading</h3>
<p>More text</p>
</section>
</section>
<section><h2>Other</h2>
</section>

Figures

I'm not sure about the markdown format I want yet, maybe something like:

![alt][ref](Caption)

I have an api where I get a preview url, a full url and sizes from a reference, so that should give me html like:

<figure>
  <a href="{full-url}"><img alt="alt" src="{preview-url}"
              width="{width}" height="{height}"></a>
  <figcaption>Caption</figcaption>
</figure>

Link shortcuts

A normal [linktext](url) should be supported, but also some [name][tag] variants, like I want [nom][crate] to be expanded to <a href="https://lib.rs/crates/nom">nom</a> and [Fa 1 2021] to expand to <a href="https://fantomenindex.krats.se/2021/1">Fa 1/2021</a> (example target in swedish, sorry, but basically a regex match on the link string converts to a url build from match groups).

This part I have managed to implement with pulldown-cmark Parser::new_with_broken_link_callback.

I couldn't find any crate with the functionality that you are looking for, sadly, so I would probably go with nom as well and write the parser myself.

Maybe other users might be interested in this as well and would be willing to contribute? I for one am interested :grinning_face_with_smiling_eyes:

This forum uses markdown-it with extensions. You'll need to pull in a JS execution environment, though.

Thanks for the comments. I'll avoid pulling in a JS environment in this project, and if possible, I'll avoid writing my own markdown parser as well.

Looking closer at pulldown-cmark, I have implemented some of my requirements. Going forward, I think I'll quit my iterator-wrapping approach and instead write my own serializer. That is; I'll continue to use the parser from pulldown_cmark, but I will write my own collect_html function.

You could use discord-md as a starting point. It uses nom for parsing and seems to be reasonably documented at quick glance.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.