I just spotted that elm (which is a great way to understand functional programming syntax and semantics, if you’re familiar with web tech) has a parser library, and I thought some of the ideas were interesting. I’m going to use the names that Evan does for the ideas.
Parser pipeline - this works in a similar way to nom’s
do_parsefunction. It’s a really nice way of parsing a more complex structure, and can be thought of as a combinator, as it applies a list of other parsers in order. In do_parse, you use a name prefix to capture the value, and then build your complex type at the end, whereas in the elm library a
recordis returned (works like a tuple in this case).
Context stack - for me, this is a great idea and I haven’t seen it before in a parser builder library. When combining your parsers, you can optionally provide a
context, a string that will be used in error messages to explain to the user what context you are in (e.g. a
List, a pdf string literal, …), which makes error messages much easier to understand. Here is the section of the elm docs verbatim
Most parsers tell you the row and column of the problem:
Something went wrong at (4:17)
That may be true, but it is not how humans think. It is how text editors think! It would be better to say:
I found a problem with this list: [ 1, 23zm5, 3 ] ^ I wanted an integer, like 6 or 90219.
Notice that the error messages says this list. That is context! That is the language my brain speaks, not rows and columns.
This parser package lets you annotate context with the inContext function. You can let the parser know “I am trying to parse a “list” right now” so if an error happens anywhere in that context, you get the hand annotation!
Note: This technique is used by the parser in the Elm compiler to give more helpful error messages.
Delayed commit - In most parsing, you need some kind of backtracking, where when you match the first part of a parser, you can still backtrack and try an alternative parser if it does not succeed, rather than just failing. This comes up in nom’s
many*variants (and probably others too). I found the way the elm parser library handles this easy to understand.
I hope that despite this not being strictly Rust, it is of interest to people.