Which parser combinator crate for parsing rustc output?

TLDR

Which parser combinator crate would be most suitable for parsing error messages generated by `rustc?

Background

Code which engages in type-level programming tends to produce compiler error messages unfit for human consumption.

For example uom produces compiler error messages containing (many occurrences of) beauties such as this:

Quantity<(dyn Dimension<L = PInt<UInt<UTerm, B1>>, J = Z0, Kind = (dyn Kind +
'static), N = Z0, T = NInt<UInt<UInt<UTerm, B1>, B0>>, M = Z0, Th = Z0, I = Z0>
+ 'static), (dyn uom::si::Units<f32, luminous_intensity =
uom::si::luminous_intensity::candela, mass = uom::si::mass::kilogram, length =
uom::si::length::meter, time = uom::si::time::second, thermodynamic_temperature
= uom::si::thermodynamic_temperature::kelvin, amount_of_substance =
uom::si::amount_of_substance::mole, electric_current =
uom::si::electric_current::ampere> + 'static), f32>

To make this digestible by humans, the tiny portion of signal among all that noise should be extractend and presented as something like this

Quantity<m^1 s^-2, Kind, f32>

or, in relatively rare cases where more detail is needed, maybe something like this

Quantity<m^1 s^-2, Kind, f32; L=meter M=kilogram T=second I=ampere Th=kelvin N=mole J=candela>

uom itself depends on typenum for which tnfilt is able to perform transformations such as these:

  • PInt<UInt<UTerm, B1>> -> PInt<U1>
  • NInt<UInt<UInt<UTerm, B1>, B0>> -> NInt<U2>

tnfilt is simply a parser written in nom into which you pipe the output of rustc.

I would like to write a similar tool for uom, but have no idea which of the many parser combinator crates available in Rust I should use for this task (which seems much more involved than the trivial transformation performed by tnfilt). Can you provide any information that might help me choose one?

Is it possible to get higher-level information from rustc?

I would prefer to avoid parsing text at altogether, by getting some higher-level information from rustc. Is this possible, somehow?

You can parse fairly arbitrary Rust grammar targets with the crates syn and proc_macro2.

I've not tried it with error outputs though!

A nice generic parser combinator library, if you're looking, is nom, but there's plenty of other good options. I think you should be able to wrap up the above in a combinator: sounds like it could be interesting!

The top level rustc library crate looks maybe hackable into something, but it seems like a last resort.

Hmm, hadn't thought of trying syn or proc_macro2 on this, at all.

nom was the first one of these I ever looked at in Rust, but with all my previous parser combinator use having been in Haskell, the increase in noise was rather off-putting, so I avoided them for a while.

For this project, I looked at chumsky, which seemed quite appealing in a number of ways. But when I actually tried to kick off the project, and wanted to start off in a test-driven way, thoroughly testing each small element, it seemed (at least from what I gleaned from the tutorial) that chumsky parsers would be quite monolithic (it looked like nested bracketing in the grammar leads to one huge closure wrapped in recursive, making fine-grained unit testing impossible). Additionally, a major selling-point of chumsky would be the error messages; but for input that isn't generated by humans (but spat out by rustc), this probably isn't all that useful.

Which is why I'm looking for something whose strengths might be better-adapted to this problem.

I may well be being naive, but it strikes me that the difference between Rust input syntax, and the rustc error messages, would make syn and proc_macro2 not especially well-suited.

Generally agreed. I've not done any exhaustive review, of course, but it seems like it's just hard to make a really clean parser library in Rust at the moment. Perhaps it's simplest to just write a bunch of recursive descent functions (&str) -> ParseResult<T>.

I would guess those big fat expressions would always parse as Rust types at least, the most likely reason for them not to would be if rustc started eliding the irrelevant parts, which should mean you don't have anything to do! But yeah it could break pretty easily.

Every time a closure is used, that can be a function instead, returning the appropriate impl Trait. That's how you can test smaller parts and build a parser from smaller parts.

That said, the types rustc emits as error messages should be valid types to parse with syn.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.