Today I released Parsel, a "zero-code" parser generator.
Parsel is a set of
#[derive] macros and helper types for generating ready-to-use parsers very easily and quickly.
The Rust ecosystem already offers a wide variety of parsing crates, so why would you need yet another one? There are basically two main approaches to parser generators in Rust:
- Parser combinators. These are implementations of the classic design from functional programming languages. They allow programmers to write the parsing logic itself seamlessly. However, one would still need to define strongly typed AST nodes, and then map the low-level output of the combinators manually. In addition, the grammar is implicit in the procedural part of the code (i.e., the implementation of the parsers itself).
- Traditional and macro-based parser generators (such as
Pest), which make the grammar explicit. However, they still suffer from the need to map parsed values to domain-specific AST node types after the fact. (This is especially apparent in the case of
Pest, which explicitly encourages the use of
unwrap()on the nodes of the generated raw parse tree.)
Parsel solves all of these problems by letting your custom AST node types be the ground truth for the grammar, from which it automatically derives implementations of the
The way this works is as follows:
structtypes will implement both traits by parsing and printing each field in sequence, in the order of declaration. This behavior corresponds to simple sequencing/concatenation in a grammar.
enumtypes will implement
Parseby attempting to parse each variant in order, and returning the result of the first one that succeeds.
ToTokenwill be implemented on enums in the obvious way, by forwarding to the fields of the currently active variant. This behavior corresponds to alternation of productions in a grammar.
What about parsing more complicated productions, such as arbitrary repetition, parentheses/grouping, and optional sub-productions? Parsel provides a rich set of generic helper types for each such common task. You can parameterize these helper types on the sub-productions of your grammar, and then directly embed them into higher-level AST node types in order to support deriving all kinds of complicated parsers.
See the JSON parser example in the official docs.
Displayimplementations by forwarding to
- Built-in heuristics for improving error messages of (potentially speculative) parsing of alternation. Currently, when the parsing of all variants of an
enumfails, Parsel returns the error message corresponding to the production that got furthest in the underlying token stream. It is likely that this production/enum is the one that was intended by the author of the input being parsed.
- AST node helper types for parsing…:
- AST node helper types (
RightAssoc) for avoiding infinite left recursion and deep right recursion when trying to naïvely parse binary operators. These helpers use iterative parsing and transform the parsed structure into a proper left-leaning or right-leaning tree internally.
- Macros for defining your own keywords in the grammar being parsed. The provided
CustomIdenttype can also be parameterized over the set of keywords, and it will reject such keywords when parsing, regardless of the behavior of
- Handling redundant trait bounds generated by recursive (or mutually recursive) productions, by means of a
#[parsel(recursive)]attribute. Such bounds should be omitted because they currently cause rustc's trait solver to fall into infinite recursion, even though they are technically correct. Thus, the need for this attribute will hopefully be obviated once Chalk lands in stable Rust.
- Parsel builds upon
quote, so their assumptions need to be satisfied. For example, it is not possible to parse whitespace-sensitive grammars.
- Performance is likely worse (by a constant factor) than that of an equivalent hand-written recursive descent parser.