Migrating to Rust a Flex + GNU Bison parser

Hi there :wave:

I have a lexer and parser that was originally written using Flex and GNU Bison (which generate C++ of a parser). I was able to create a bridge between the C++-generated code and Rust by using bindgen and cxx (see this), however, I'm not quite happy with this solution because it contains C++, a ffi, and all the pain that comes with it, I would like my parser to be full Rust

So far, I've tried many approaches, and I haven't been able to find rust crates that allow me to port this lexer and parser to Rust while providing some degree of certainty that the end result is going to be as correct as the original flex+bison generated one, particularly, there are no rust crates that can run the GLR parsing algorithm, and no Rust lexer that follows the same semantics as Flex

Do you have any thoughts on this? Can someone point me in the right direction?

Thank you!

Lex/flex and yacc/bison are archaic tools, and even in their hayday, they weren't very elegant or even easy to use. I don't think you are going to find a lot of effort towards supporting and re-implementing their specific algorithms, certainly not in Rust.

If you were to migrate a parser to Rust, you better go back to the drawing board and write an idiomatic Rust parser. There are a couple of options:

  1. Re-write the parser by hand. This should be easy using a simple recursive descent approach, given that you already seem to have a grammar, so you won't need to rewrite it all the time.
  2. Use parser combinator libraries such as nom. This still counts as writing a parser by hand, but somewhat more declarative.
  3. Declare strongly-typed AST nodes and use parsel to automatically generate a full parser from them.
2 Likes

My first thought for getting some degree of certainty on this is to use QuickCheck to test that the two produce equivalent results for random input strings. One could then strengthen this certainty by using a full (more expensive) fuzz-testing harness:

I think one can use Bolero to run QuickCheck and a non-Quick fuzzer with the same code.

3 Likes

If your grammar is simple enough to be expressed as a PEG then you can use pest to generate a parser for you. There are also plenty of other parser libraries out there besides the ones that have already been mentioned, spending some time and picking a favorite or two definitely doesn't hurt.

2 Likes

As for writing an idiomatic parser in Rust, in my (possibly outdated) view, the main Rust crates for implementing parsers are Nom, Combine, and LALRPOP. I use Combine and am fine with it; I used to use LALRPOP but it has changed too much since then for me to say how it works now. Nom is the old standard of Rust parsing crates.

Nom and Combine are parser combinator libraries, with which one declares one's parsing rules using a Rust API, whereas LALRPOP reads separate grammar files in its own grammar syntax.

1 Like

While grmtools I don't believe has a GLR parser is true,
It seems like it might be possible to port the lex file in the repository you linked, to the next version (current master).

It will be a little bit different, grmtools lrlex doesn't support lexer actions, so the action code would need to move to the parser. The master branch is currently required for the use of the %x and %s directives.
but because the lexer doesn't support actions it uses a <Foo>regex <StackOperator>Token,

Where a stack operator is one of

  • == clear the stack and push 'State'.
  • <+State> == push State onto the stack
  • <-...> == pop the stack.

Alas though, the state stack operations in your action code specifically where you do multiple stack operations in one action aren't something which it supports:

  POP_STATE();
  PUSH_STATE(INPATH);
  PUSH_STATE(DEFAULT);

But it is the closest thing I know of to what you'd like.

1 Like