We announced the Paguroidea Parser Generator several days ago. As we have promised, we are working on improving its user-friendliness and performance. Here are some updates:
PR #38 (by @QuarticCat and @CyanPineapple) brings
lalrpop+logos implementations of the json parser, which do the similar work of our benchmark sample. One can now check the performance difference among our parser,
PR #39 (by @SchrodingerZhu) reworks the lexer implementation. The DFA now carries the last successful match in its state. By doing so, the "message passing" between lexer and parser is completely eliminated. As claimed in the flap paper:
Fusion acts on a lexer and a normalized parser, connected via tokens, and produces a grammar that is entirely token-free, in which the only branches involve inspecting individual characters.
On Aarch64, with this PR, our parser's throughput bumps from 500+MB/s to 700+MB/s on the CSV workload (random seed 0).
x86-64 also gets a
4~6% stable improvement.
PR #47,PR #48 (by @QuarticCat) simplifies the grammar of
pag files. Users now need not specify
fixpoint by themselves. This PR detects SCC in parser rules and inserts
fixpoint automatically. A bug is also addressed by this PR, where transitive references between two fixpoint rules can potentially lead to nonterminating normalization.
Raw performance measurements (by
cargo bench) are accessible at Generated Parser Benchmark.
We are currently focusing on refactoring the syntax for
pag files. Perhaps, we will be able to support arbitrary actions. We are sincerely looking forward to receiving suggestions and feedbacks from you all!