RFC: port of the LEMON Parser Generator to Rust

Hello, fellow Rusters.

I'm proudly presenting my port of the LEMON Parser Generator, written by D. Richard Hipp (of SQLite fame) to generate Rust code instead of C code.

You can find it here: lemon_rust. The documentation is still a bit a WIP, but it will come out, eventually...

The generator itself is still written in C. I'd like to port it also to Rust, but for that I'll need to understand it better yet. And it might complicate integrations of future upstream changes, so I'm leaving it as it is for now.

I'll appreciate any feedback you may have. And please consider it for your future Rust parsing necessities :wink:

In particular I would like to know if there is any feature similar to the C directive #line for Rust to redirect errors from generated code.

Thank you in advance.

1 Like

I'm kindof lost on how to use this. Would you mind adding compilation and usage instructions?

As a side-note: You can use cargo to build c-programs. Maybe that would work as a first step to make it easier to use from other rust projects' cargo system.

The generator itself is still written in C.

Just out of curiosity: is the parser for LEMON code generated by the parser generator itself?

Thanks for you interest! Detailed instructions are on the way. For now you will have to build the lemon_rust program yourself:

$ gcc lemon_rust.c -o lemon_rust

And then use it to process the whatever.y file and get the corresponding whatever.rs.

I tried to use Cargo to automate the y-to-rs conversion, not yet the compilation of lemon_rust itself, and it kind of works... You can find it at examples/example1.

Anyway, you are right, building the lemon_rust program as part of cargo build would be the right thing to do. However, from the cargo documentation, I'm not sure if cargo supports building and running compilation tools.

Just out of curiosity: is the parser for LEMON code generated by the parser generator itself?

No. Lemon_C uses a manually written parser, and Lemon_Rust shares the code. Maybe if/when it is translated to Rust it will bootstrap. That would be fun!

Having taken the time to (try to) understand Lemon's idiosyncratic flavor of C, I highly recommend you avoid repeating my mistake.

The Lemon source code is not something to learn from unless your goal is simply recreating the Lemon parser-generator. It exhibits typical C-isms, including (but certainly not limited to!)

  • indecipherable abbreviations in place of proper variable and member-data names;
  • minimal or no comments on certain pieces of (rather obtuse) code;
  • poor use of abstractions (e.g., delegation to named subroutines) to reduce mental load for the reader ‒ every routine seems to access the data directly, resulting in for loops up the gazoo and a large amount of visual noise;
  • several "roll-your own" data structure implementions (there's at least one hash table in there);

and more. Please understand that I'm not saying that Lemon itself is a bad program ‒ I actually rather like it ‒ but that it's a poor teacher for someone interested in learning about parsing techniques or good programming style.

For the former, I'd recommend picking up a textbook on compiler design/implementation (the Webernets can recommend one for you). Before you protest that textbooks are too expensive (they are), you should know that there are a number available for free from the authors' websites.

I know what you are talking about. However my goal of generating Rust code is not so hard to achieve, just change a printf() here and there...

I'm just trying to learn some Rust here... If I happen to learn about LALR parsers, the better.

However, the basic premises of Lemon and its theory of operation is sound, and IMO worth reusing, despite the weird C style. And I've seen much worse C than that!

BTW, I've added a DOCUMENTATION.md file to the repository, with usage instructions, just in case anyone is interested.

I'm thrilled to see more people interested in Lemon; honestly it's such a better way to implement a parser generator than Yacc/Bison. The named (instead of indexed) terminals and non-terminals, generally being amenable to threading (as opposed to vigorously anti-threading) and clean API makes it a very compelling tool.

I've been working on my own port for a few weeks now (off and on). I've broken up the lemon.c file back into its original constituents, modernized/cleaned up some of the syntax and have been working to build an idiomatic parser template for Rust. I'd love your thoughts!

This is very much still a work in progress but you can check out the repo on my Github and the Rust branch is mm/experimental-rust. GitHub - martinmroz/lemon: Lemon Parser Generator -- check out templates/skeleton.rs on my branch.

I'm planning on adding a whole lot of tests as I go. Also adding a bunch of examples and documentation that I've dug up myself.

I'd love to see us work together to make Lemon better!