Writing a great parser library (elm-lang/parser)


#1

I just spotted that elm (which is a great way to understand functional programming syntax and semantics, if you’re familiar with web tech) has a parser library, and I thought some of the ideas were interesting. I’m going to use the names that Evan does for the ideas.

  1. Parser pipeline - this works in a similar way to nom’s do_parse function. It’s a really nice way of parsing a more complex structure, and can be thought of as a combinator, as it applies a list of other parsers in order. In do_parse, you use a name prefix to capture the value, and then build your complex type at the end, whereas in the elm library a record is returned (works like a tuple in this case).

  2. Context stack - for me, this is a great idea and I haven’t seen it before in a parser builder library. When combining your parsers, you can optionally provide a context, a string that will be used in error messages to explain to the user what context you are in (e.g. a Point, a List, a pdf string literal, …), which makes error messages much easier to understand. Here is the section of the elm docs verbatim

    Most parsers tell you the row and column of the problem:

    Something went wrong at (4:17)
    That may be true, but it is not how humans think. It is how text editors think! It would be better to say:

    I found a problem with this list:
    
       [ 1, 23zm5, 3 ]
            ^
    I wanted an integer, like 6 or 90219.
    

    Notice that the error messages says this list. That is context! That is the language my brain speaks, not rows and columns.

    This parser package lets you annotate context with the inContext function. You can let the parser know “I am trying to parse a “list” right now” so if an error happens anywhere in that context, you get the hand annotation!

    Note: This technique is used by the parser in the Elm compiler to give more helpful error messages.

  3. Delayed commit - In most parsing, you need some kind of backtracking, where when you match the first part of a parser, you can still backtrack and try an alternative parser if it does not succeed, rather than just failing. This comes up in nom’s alt and many* variants (and probably others too). I found the way the elm parser library handles this easy to understand.

I hope that despite this not being strictly Rust, it is of interest to people. :slight_smile:


#2

Interesting! nom is a pretty complex beast, and a really easy to use parser lib with similar feature set would surely be appreciated.

(And I think you meant to link to http://package.elm-lang.org/packages/elm-tools/parser/latest?)


#3

I changed the link :slight_smile:

Also I know nom has some functionaity that a normal parser would not need (there is sometimes more than 1 parser combinator for the same thing, where each one has different performance characteristics), and it supports distinguishing between incomplete, where the parser could succeed with more data, and error, where there is no way the parser could match.

There’s a variant of nom that doesn’t have incomplete, and so is simpler, but I can’t remember the name of it.

EDIT it’s synom