Simple Nom Question: Read to End of Line (or is using nom a bad idea?)

I'm starting to get the hang of nom, at least enough that I have code that works, but looks more cumbersome than seems plausible. What is the right way to do this:

fn parser_comment(input: &str) -> IResult<&str, ParsedObj> {
    let (input, (_, comment, _)) = tuple((
        tag("#"),
        take_until("\n"),
        char('\n'),
    ))(input)?;

    Ok(
        (input, ParsedObj::Comment(String::from(comment)))
    )
}

Thanks,

-kb

Or maybe a better question: What is a good approach to parsing data? I thought nom looked respected, but the crickets I am hearing make me wonder.

-kb

IMNSHO, nom is pretty unidiomatic to use. But it is widely used. It's also a pain for me to figure out what's gone wrong with it whenever I try to do so :slightly_smiling_face:. But to be fair, I rarely use it; perhaps if I used it more I'd be enlightened or something.

Anyway, let me refresh my memory a bit... I think this does the same thing:

fn parser_comment(input: &str) -> IResult<&str, ParsedObj> {
    let mut parser = preceded(
        tag("#"), 
        terminated(
            take_until("\n"),
            char('\n'),
        )
    );

    let (rest, comment) = parser(input)?;
    let comment = ParsedObj::Comment(comment.into());
    Ok((rest, comment))
}

pest is another popular alternative. I've also seen chumsky and pom mentioned. I don't have enough experience to advice on what's best, so these are just possibilities.

2 Likes

Here's another option that should work:

fn parser_comment(input: &str) -> IResult<&str, ParsedObj> {
    let (input, comment) = delimited(
        tag("#"),
        take_until("\n"),
        tag("\n"),
    )(input)?;

    Ok((input, ParsedObj::Comment(comment.into())))
}
2 Likes

So, it looks like I wasn't crazy off-course, but I should go understand terminated() and preceded() and delimited(). That's exactly the kind of help I was hoping for.

Also probably I should look at pest, chumsky, and pom. If nothing else but to appreciate other approaches, while the problem space is fresh in my mind.

Um, I suppose regex stuff, too. (I have always hated regex…)

Oooow. While I have you: Say I were looking for more interesting stuff in the line (a non-comment line) and wanted to be liberal about white space, wanted to be liberal about exactly how a line ends (linefeed vs carriage return linefeed), does it get as ugly as I fear?

Thanks @quinedot and @mbrubeck ! You continue to give the Rust community a good name.

-kb

There's space0 and space1 (or if it can cross newlines, multispace0 and multispace1), and there's line_ending. Then you build it up using delimited and terminated and the like.

1 Like

I'm getting better at this! (But find myself starting to miss regex, if that is possible.)

Here is my improved comment parser:

fn parser_comment(input: &str) -> IResult<&str, ParsedObj> {
    let (rest, comment) = delimited(
        tuple((space0, tag("#"), space1)),
        take_until("\n"),  // Works, but not as robust as I would like.
        // take_until(line_ending),  Would be better, but does not compile. Grrr.
        line_ending,
    )(input)?;

    Ok(
        (rest, ParsedObj::Comment(String::from(comment)))
    )
}

I might live with that.

-kb

P.S. If the examples in the nom documentation were a bit more complete (not pure unit tests) and a bit more motivated than abc and def it would be easier to get the hang of how to compose this stuff.

Nom makes a lot more sense when you realize that ideally defining a parser would look like:

const COMMENT = ...;
...

const FILE = alt((COMMENT, ...));

but that is currently blocked by Rust not supporting several things in const, like inference, generic values, lambda types, etc.

You could probably macro up something to auto wrap these in fn, but it would make the usage a lot less clear.


In my experience pest is very fast to get a parser going, but the effective defaults you get with a PEG are extremely unintuitive for recursive parsing, and you can't really nicely emulate the lexeme/parse split that you need for standard languages (not that bad: you end up with an id rule that excludes every keyword). The result is quick starts with a long tail, though that probably evens out with more experience.

It also generally needs a doubling of the parse where the grammar recognizes the "pairs" (matched rule and source span) that you then map into an AST with very manual parser-like code that I feel could be better automated, and dealing with expression trees is a mess without using eg. Pratt parsing (which it provides as a library).

Good, but room for improvement.


At this point if I need a really solid text parser, I start with the dumbest thing that works:

fn parse_foo(source: &str): Result<(Foo, &str), Error> {
  // standard Rust code with source.strip_prefix(),
  // let mut it = source.iter(), etc....
}

and add things as needed (so starting with type aliases for the input and result types is a good idea). Getting some guidance from libraries like nom on how you can write these is a good idea (error handling in particular!) - combinators are a great way to avoid having repetitive code, used in moderation, and it's quite simple to go back and forwards to using nom with how flexible the parser signature is.

For binary, nom provides really nice combinators that don't need as much tweaking, so it's easier to just use straight there.

In reality, most of the effort of writing a text parser in my experience is writing all the test cases! Especially getting sensical errors is a fundamentally tricky issue that involves being sure in your grammar where you've definitely got a syntax error as early as possible.

2 Likes

This has been a very useful thread for me!

  • It is good to know about the parsers pest, Chumsky, and pom, I made some notes about them, I might have occasion to use one or more of them in the future, but looks like I won't for the moment.
  • I detoured to go through all of those interesting methods on String that I had not carefully read previously. (The Pattern is just flexible enough to make me want it to be more flexible, but then I realize my biggest wish would be rather expensive: Let me pass a closure that takes a string instead of a char. Not going to happen in Rust.)
  • It looks like I will stick with nom on this project, I understand how to use it better than I did, it is nearly doing all I need. The idea that I don't need to use nom features for everything I do in nom, that I am allowed to user Rust's String methods is, um, liberating!

Thank you to all who have helped here.

-kb

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.