How to refer to 'group' in `nom`?

Hi all,

I'm new to Rust and trying to write a parser for practice. I'd like use nom to parse some markups like

== Section name ==, === Section name === ... to get its name and level for section level 2 or more.

I think something like

    let (remain, (marker1, _, section_name, _, marker2)) =
        tuple((section_marker, space1, section_name, space1, section_marker))(i).unwrap();

    if marker1.len() != marker2.len() {
        return Err(...);
    }

could work but I'm looking for a way to test/parse the markers without manually writing the if condition (i.e. something like writing a regex (?P<markup>=+)\s+.+\s+(?P=markup) which I can refer to a group).

I searched the doc of nom but not sure which parser can achieve this. Is it possible? Thank you!

It doesn't look possible with a regex because nom uses the regex crate, which doesn't support backreferences (regex docs).

Hi, thanks for the reply. I’m not looking for a regex solution, but a way to constraint the parser behaviors (to match the same numbers of = before and after text), if it exists.

I should have been more explicit, but I meant one of the nom parsers from nom::regexp::str. They use the regex crate internally so they won't work.

I think you'd ultimately have to do something like nom::combinator::flat_map where you take the output of one parser to define a second parser.

How do you wish to handle:

=== abc === def ===

?

Is the heading "abc === def"? Are you requiring spaces after the initial === and before the final ===? Are you disallowing trailing spaces (i.e. requiring the final marker to be precisely at the end of the line)?

Assuming yes to all, given your constraints to avoid your own ifs, this felt surprisingly messy and took me, a nom newbie, way too long to write. That being said, I learned a lot here, especially with how to use not() effectively. I've compiled and tested this code. I wonder whether anyone has a simpler solution.

fn parse_heading_and_level(s: &str) -> IResult<&str, (&str, usize)> {
    let (rest, marker) = terminated(take_while(|c| c == '='), space1)(s)?;
    let (rest, heading) = recognize(many0_count(tuple((
        // This is a zero-width matcher that _fails_
        // if the passed-in combinator succeeds. So
        // it serves as a negative lookahead.
        not(tuple((
            space1,                  // trailing space
            tag(marker),             // trailing marker
            alt((line_ending, eof)), // end of line or input
        ))),
        // Otherwise, consume spaces and match contiguous
        // non-space/tab/ending characters before we need
        // to do the check for the tag again.
        tuple((space0, take_till(|c| " \t\r\n".contains(c)))),
    ))))(rest)?;
    // Move past the final space, marker, and newline.
    let (rest, _) = tuple((take(marker.len() + 1),
                           alt((line_ending, eof))))(rest)?;
    // Return not only the heading, but also its level.
    Ok((rest, (heading, marker.len())))
}

Honestly, I'd rather write something more straightforward with if by pulling out a line up front and checking just the beginning and end. It's probably more efficient in the worst case as well since we only ever check the closing marker at the end of the line. Also, it's a shorter and was easier for me to write:

fn parse_heading_and_level2(s: &str) -> IResult<&str, (&str, usize)> {
    // Pull out a line up front.
    let (rest, line) =
        terminated(take_till(|c| "\r\n".contains(c)), alt((line_ending, eof)))(s)?;
    // Match the start.
    let (heading, marker) =
        terminated(take_while(|c| c == '='), space1)(line)?;
    // Match the end.
    // (Note that `split_ascii_whitespace()` ignores trailing spaces.)
    if heading.ends_with('=') &&
        heading.split_ascii_whitespace().next_back() == Some(marker) {
        let heading = heading[..heading.len() - marker.len()].trim_end();
        Ok((rest, (heading, marker.len())))
    } else {
        Err(nom::Err::Error(Error::new(heading, ErrorKind::Tag)))
    }
}

Hi, thanks for trying hard for this! There's a lot for me to learn!

I should have been more clear on my problem. I'm trying to parse the basic Mediawiki format . The OP is about the section formatting and I just finding a nom way to easy match the = signs to be a valid section markup.

I will take some time to understand all you code!