[Solved] Nom: count nested brackets in Markdown link

How can I parse a Markdown link with Nom 6?
The trivial case is abc[name](url "title")abc should result in:

link_text="name";
link_destination="url";
link_title="title";

But how to deal with this case: abc[name1[name2]name3](url1(url2)url3 "title")abc?
I expect:

link_text="name1[name2]name3";
link_destination="url1(url2)url3";
link_title="title";

Reddit: Nom 6 question: how to count nested brackets in Markdown link? : rust

Nom makes parsing fun!

Here the missing bit:

    /// This parser is designed to work inside the `nom::sequence::delimited` parser, e.g.:
    /// `nom::sequence::delimited(tag("("), take_until_unmatched('(', ')'), tag(")"))(i)`
    /// It skips nested brackets until it finds an extra closing bracket.
    /// This function is very similar to `nom::bytes::complete::take_until(")")`, except
    /// it also takes nested brackets.
    /// Escaped brackets e.g. `\(` and `\)` are not considered as brackets and are taken by
    /// default.
    pub fn take_until_unmatched(
        opening_bracket: char,
        closing_bracket: char,
    ) -> impl Fn(&str) -> IResult<&str, &str> {
        move |i: &str| {
            let mut index = 0;
            let mut bracket_counter = 0;
            while let Some(n) = &i[index..].find(&[opening_bracket, closing_bracket, '\\'][..]) {
                index += n;
                let mut it = i[index..].chars();
                match it.next().unwrap_or_default() {
                    c if c == '\\' => {
                        // Skip the escape char `\`.
                        index += '\\'.len_utf8();
                        // Skip also the following char.
                        let c = it.next().unwrap_or_default();
                        index += c.len_utf8();
                    }
                    c if c == opening_bracket => {
                        bracket_counter += 1;
                        index += opening_bracket.len_utf8();
                    }
                    c if c == closing_bracket => {
                        // Closing bracket.
                        bracket_counter -= 1;
                        index += closing_bracket.len_utf8();
                    }
                    // Can not happen.
                    _ => unreachable!(),
                };
                // We found the unmatched closing bracket.
                if bracket_counter == -1 {
                    // We do not consume it.
                    index -= closing_bracket.len_utf8();
                    return Ok((&i[index..], &i[0..index]));
                };
            }

            if bracket_counter == 0 {
                Ok(("", i))
            } else {
                Err(Err::Error(Error::from_error_kind(i, ErrorKind::TakeUntil)))
            }
        }
    }

Depending on your use case, you might be interested in one of the dedicated Markdown parser packages (e.g. pulldown_cmark) as well.

Thanks for the hint. I am actually using it already in my project: tp-note/sse_server.rs

My use case is, that I want to find and analyze as quick as possible the first hyperlink in Markdown or RestructuredText notation. Finally, this is much more complicated than I expected. Especially, when it comes to link references, because these can span multiple lines, etc. Then, many tokens have different escaping rules, ...

The good thing is, I learn about parsing with Nom 6. I found it hard to get started, but actually, it is a very natural approach to parsing: The more you dive into the details of the specification,
the more the added the parser combinator gets detailed.

1 Like