Paguroidea Parser Generator Development Update

We announced the Paguroidea Parser Generator several days ago. As we have promised, we are working on improving its user-friendliness and performance. Here are some updates:

PR #36 (by @QuarticCat) merges identical accepting states for lexer. This reduces extra branches in the lexer and thus improved the performance.

PR #37 (by @SchrodingerZhu) compacts the lexer states for trailing literal lookup. This brings 6~7% speedup when target language has a relatively large set of keywords.

PR #38 (by @QuarticCat and @CyanPineapple) brings lalrpop and lalrpop+logos implementations of the json parser, which do the similar work of our benchmark sample. One can now check the performance difference among our parser, serde-json, pest, lalrpop, and lalrpop+logos.

PR #39 (by @SchrodingerZhu) reworks the lexer implementation. The DFA now carries the last successful match in its state. By doing so, the "message passing" between lexer and parser is completely eliminated. As claimed in the flap paper:

Fusion acts on a lexer and a normalized parser, connected via tokens, and produces a grammar that is entirely token-free, in which the only branches involve inspecting individual characters.

On Aarch64, with this PR, our parser's throughput bumps from 500+MB/s to 700+MB/s on the CSV workload (random seed 0). x86-64 also gets a 4~6% stable improvement.

PR #40 (by @SchrodingerZhu) gives a way to set random seed used in benchmark by setting PAG_RANDOM_SEED environment variable. This PR makes it easy to compare performance results in a consistent way.

PR #47,PR #48 (by @QuarticCat) simplifies the grammar of pag files. Users now need not specify definition or fixpoint by themselves. This PR detects SCC in parser rules and inserts fixpoint automatically. A bug is also addressed by this PR, where transitive references between two fixpoint rules can potentially lead to nonterminating normalization.

Raw performance measurements (by cargo bench) are accessible at Generated Parser Benchmark.

We are currently focusing on refactoring the syntax for pag files. Perhaps, we will be able to support arbitrary actions. We are sincerely looking forward to receiving suggestions and feedbacks from you all!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.