[Solved] Nom: count nested brackets in Markdown link

reu · November 18, 2020, 6:25am

How can I parse a Markdown link with Nom 6?
The trivial case is abc[name](url "title")abc should result in:

link_text="name";
link_destination="url";
link_title="title";

But how to deal with this case: abc[name1[name2]name3](url1(url2)url3 "title")abc?
I expect:

link_text="name1[name2]name3";
link_destination="url1(url2)url3";
link_title="title";

reu · November 18, 2020, 8:54am

Reddit: Nom 6 question: how to count nested brackets in Markdown link? : rust

reu · November 23, 2020, 10:56am

Nom makes parsing fun!

Here the missing bit:

    /// This parser is designed to work inside the `nom::sequence::delimited` parser, e.g.:
    /// `nom::sequence::delimited(tag("("), take_until_unmatched('(', ')'), tag(")"))(i)`
    /// It skips nested brackets until it finds an extra closing bracket.
    /// This function is very similar to `nom::bytes::complete::take_until(")")`, except
    /// it also takes nested brackets.
    /// Escaped brackets e.g. `\(` and `\)` are not considered as brackets and are taken by
    /// default.
    pub fn take_until_unmatched(
        opening_bracket: char,
        closing_bracket: char,
    ) -> impl Fn(&str) -> IResult<&str, &str> {
        move |i: &str| {
            let mut index = 0;
            let mut bracket_counter = 0;
            while let Some(n) = &i[index..].find(&[opening_bracket, closing_bracket, '\\'][..]) {
                index += n;
                let mut it = i[index..].chars();
                match it.next().unwrap_or_default() {
                    c if c == '\\' => {
                        // Skip the escape char `\`.
                        index += '\\'.len_utf8();
                        // Skip also the following char.
                        let c = it.next().unwrap_or_default();
                        index += c.len_utf8();
                    }
                    c if c == opening_bracket => {
                        bracket_counter += 1;
                        index += opening_bracket.len_utf8();
                    }
                    c if c == closing_bracket => {
                        // Closing bracket.
                        bracket_counter -= 1;
                        index += closing_bracket.len_utf8();
                    }
                    // Can not happen.
                    _ => unreachable!(),
                };
                // We found the unmatched closing bracket.
                if bracket_counter == -1 {
                    // We do not consume it.
                    index -= closing_bracket.len_utf8();
                    return Ok((&i[index..], &i[0..index]));
                };
            }

            if bracket_counter == 0 {
                Ok(("", i))
            } else {
                Err(Err::Error(Error::from_error_kind(i, ErrorKind::TakeUntil)))
            }
        }
    }

H2CO3 · November 23, 2020, 11:23am

Depending on your use case, you might be interested in one of the dedicated Markdown parser packages (e.g. pulldown_cmark) as well.

reu · November 23, 2020, 11:18pm

Thanks for the hint. I am actually using it already in my project: tp-note/sse_server.rs

My use case is, that I want to find and analyze as quick as possible the first hyperlink in Markdown or RestructuredText notation. Finally, this is much more complicated than I expected. Especially, when it comes to link references, because these can span multiple lines, etc. Then, many tokens have different escaping rules, ...

The good thing is, I learn about parsing with Nom 6. I found it hard to get started, but actually, it is a very natural approach to parsing: The more you dive into the details of the specification,
the more the added the parser combinator gets detailed.

reu · November 27, 2020, 10:34pm

I published the solution as an own crate:

parse_hyperlinks::take_until_unbalanced - Rust

use nom::bytes::complete::tag;
use nom::sequence::delimited;
use parse_hyperlinks::take_until_unbalanced;

let mut parser = delimited(tag("<"), take_until_unbalanced('<', '>'), tag(">"));
assert_eq!(parser("<<inside>inside>abc"), Ok(("abc", "<inside>inside")));

parse-hyperlinks - crates.io: Rust Package Registry

system · February 25, 2021, 10:34pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to refer to 'group' in `nom`? help	6	738	July 2, 2021
Nom passing repeating patterns help	3	744	September 20, 2020
What is the RustDoc brackets syntax called? help	8	423	September 29, 2023
Help with nom parsers help	4	192	February 27, 2024
Writing binary parser with Nom help	6	311	December 26, 2023

[Solved] Nom: count nested brackets in Markdown link

Related Topics