Help wanted: parser-c (and parser-haskell, corollary, and rust-corrode)


#1

I started investigating a few months ago if it were possible to port Corrode (the C to Rust transpiler project) to Rust. In particular, Rust lacks a native C parser library, but one exists in Haskell in the form of language-c. My wild idea to accomplish this was partially transpiling Haskell to Rust and manually fixing up the remaining issues.

I’m as surprised as anyone to announce that this mostly worked: parser-c is a Rust library than can parse simple C programs (and some complex ones!) There’s a lot of work left in porting language-c’s test bench over and hunting down bugs, and so I’m asking for help from the community. If you want to help maintain the project and get this to full parity with Haskell’s parser, please reply here or on the issue tracker!

What work is left: issues that prevent parser-c from parsing code are mostly errors resulting from its porting, not logical errors (since a large amount of the porting was automated). While a large portion of its source code is actually generated Haskell code (from Haskell’s lexer and parser libraries, Happy and Alex), there is a path forward for natively outputting this generated code as Rust. So, armed with a debugger and a good set of test cases, I’m confident parser-c can get to the point where it can be used as a C parsing backend and have tools be built on top of it.

Is this a good idea? Maybe! Other C parsing libraries exist and expose C bindings that Rust can consume; but this one, uniquely, is written in Rust.

Over the course of the project, I’ve also written a handful of additional crates you might be interested in:

parser-haskell is a parser for Haskell code written in Rust, using the LALRPOP parser library. It may serve as a good start for a fully-featured parser.

corollary is a very experimental program for transpiling Haskell programs to Rust. The README should be clear about the gulf of issues that separates this from being a true transpiler; but it was useful enough to get parser-c to the stage it is now. Future work could involve expanding the Haskell stdlib implementation, hooking into the GHC API for type information (critical for supporting Haskell’s currying), or optimizations like using references instead of aggressively cloning all data values.

And lastly: rust-corrode is a (you guessed it) very experimental port of Corrode, cross-compiled into Rust using the above tools. It’s non-functional yet. But just as Corrode uses Haskell’s language-c library under the hood, with a lot of manual fixes rust-corrode could use parser-c to transpile C code into Rust. This might enable interesting compilation pipelines, like #including C code directly into Rust. And a native Rust library would be easy for the Rust community to modify to suit their needs.

I’m looking for help maintaining and improving these crates. It’s been a fun project, but now that the proof of concept has been demonstrated I’ll be focuing on just improving parser-c. If you have ideas about how to improve these crates or want to become a maintainer on any of them, please reach out!