Today I released Parsel, a "zero-code" parser generator.
What?
Parsel is a set of #[derive]
macros and helper types for generating ready-to-use parsers very easily and quickly.
Why?
The Rust ecosystem already offers a wide variety of parsing crates, so why would you need yet another one? There are basically two main approaches to parser generators in Rust:
- Parser combinators. These are implementations of the classic design from functional programming languages. They allow programmers to write the parsing logic itself seamlessly. However, one would still need to define strongly typed AST nodes, and then map the low-level output of the combinators manually. In addition, the grammar is implicit in the procedural part of the code (i.e., the implementation of the parsers itself).
- Traditional and macro-based parser generators (such as
Pest
), which make the grammar explicit. However, they still suffer from the need to map parsed values to domain-specific AST node types after the fact. (This is especially apparent in the case ofPest
, which explicitly encourages the use ofunwrap()
on the nodes of the generated raw parse tree.)
Parsel solves all of these problems by letting your custom AST node types be the ground truth for the grammar, from which it automatically derives implementations of the syn::Parse
and quote::ToTokens
traits.
How?
The way this works is as follows:
-
struct
types will implement both traits by parsing and printing each field in sequence, in the order of declaration. This behavior corresponds to simple sequencing/concatenation in a grammar. -
enum
types will implementParse
by attempting to parse each variant in order, and returning the result of the first one that succeeds.ToToken
will be implemented on enums in the obvious way, by forwarding to the fields of the currently active variant. This behavior corresponds to alternation of productions in a grammar.
What about parsing more complicated productions, such as arbitrary repetition, parentheses/grouping, and optional sub-productions? Parsel provides a rich set of generic helper types for each such common task. You can parameterize these helper types on the sub-productions of your grammar, and then directly embed them into higher-level AST node types in order to support deriving all kinds of complicated parsers.
Usage Examples
See the JSON parser example in the official docs.
Other Features, Highlights
- Deriving
FromStr
andDisplay
implementations by forwarding toParse
andToTokens
, respectively. - Built-in heuristics for improving error messages of (potentially speculative) parsing of alternation. Currently, when the parsing of all variants of an
enum
fails, Parsel returns the error message corresponding to the production that got furthest in the underlying token stream. It is likely that this production/enum is the one that was intended by the author of the input being parsed. - AST node helper types for parsing…:
- Parentheses, square brackets and curly braces
- Optional productions
- Repetitions, optionally separated by punctuation
-
Literals in a strongly-typed manner, without having to re-parse them, as is the case with
syn::Lit
. - End-of file and not end-of-file
- AST node helper types (
LeftAssoc
andRightAssoc
) for avoiding infinite left recursion and deep right recursion when trying to naïvely parse binary operators. These helpers use iterative parsing and transform the parsed structure into a proper left-leaning or right-leaning tree internally. - Macros for defining your own keywords in the grammar being parsed. The provided
CustomIdent
type can also be parameterized over the set of keywords, and it will reject such keywords when parsing, regardless of the behavior ofproc-macro2
'sIdent
. - Handling redundant trait bounds generated by recursive (or mutually recursive) productions, by means of a
#[parsel(recursive)]
attribute. Such bounds should be omitted because they currently cause rustc's trait solver to fall into infinite recursion, even though they are technically correct. Thus, the need for this attribute will hopefully be obviated once Chalk lands in stable Rust.
Limitations
- Parsel builds upon
syn
andquote
, so their assumptions need to be satisfied. For example, it is not possible to parse whitespace-sensitive grammars. - Performance is likely worse (by a constant factor) than that of an equivalent hand-written recursive descent parser.