Nom makes a lot more sense when you realize that ideally defining a parser would look like:
const COMMENT = ...;
...
const FILE = alt((COMMENT, ...));
but that is currently blocked by Rust not supporting several things in const
, like inference, generic values, lambda types, etc.
You could probably macro up something to auto wrap these in fn, but it would make the usage a lot less clear.
In my experience pest is very fast to get a parser going, but the effective defaults you get with a PEG are extremely unintuitive for recursive parsing, and you can't really nicely emulate the lexeme/parse split that you need for standard languages (not that bad: you end up with an id
rule that excludes every keyword). The result is quick starts with a long tail, though that probably evens out with more experience.
It also generally needs a doubling of the parse where the grammar recognizes the "pairs" (matched rule and source span) that you then map into an AST with very manual parser-like code that I feel could be better automated, and dealing with expression trees is a mess without using eg. Pratt parsing (which it provides as a library).
Good, but room for improvement.
At this point if I need a really solid text parser, I start with the dumbest thing that works:
fn parse_foo(source: &str): Result<(Foo, &str), Error> {
// standard Rust code with source.strip_prefix(),
// let mut it = source.iter(), etc....
}
and add things as needed (so starting with type aliases for the input and result types is a good idea). Getting some guidance from libraries like nom on how you can write these is a good idea (error handling in particular!) - combinators are a great way to avoid having repetitive code, used in moderation, and it's quite simple to go back and forwards to using nom with how flexible the parser signature is.
For binary, nom provides really nice combinators that don't need as much tweaking, so it's easier to just use straight there.
In reality, most of the effort of writing a text parser in my experience is writing all the test cases! Especially getting sensical errors is a fundamentally tricky issue that involves being sure in your grammar where you've definitely got a syntax error as early as possible.