Best way to implement a repetitive parser

elliottslaughter · September 8, 2022, 8:54pm

I've got a data structure that basically looks like:

pub enum Record {
    AVariant,
    BVariant { x : i32, y : u64, },
    CVariant { ... },
}

And I want to be able to parse into this data structure from text-based records that look like:

A Variant
B Variant 1234 12abc
C Variant ...

(Note some of the integer fields are decimal and others are hex. Yes, I know. I didn't pick this format.)

I started writing this in nom, which was fine except for the fact that the code is very repetitive because the enum I'm doing this for has about 100 variants each with 2-5 fields on average. But at least the end result would (probably) be pretty fast and the code is straightforward, if repetitive.

Then I thought: hey, serde could be really good at automating the boring parts of this. Minus a few ugly details, this code follows a rigid structure and I could basically generate this code with a bit of glue.

I headed over to Writing a data format · Serde, copied the example, and started playing with it. But as clearly outlined in the text, this example is not based on a proper parser of any sort. My inclination was to just drag nom back in for that part, but I'm struggling to make the two talk to each other.

I'm wondering what people do in practical serde format backends? Is there a go-to parser library that people use? And would there happen to be any simple examples sitting around that would show me how to get started with that?

Thanks in advance.

H2CO3 · September 8, 2022, 9:23pm

They usually roll their own by hand.

I know there is Pest. It might be just enough to remove some of the boilerplate.

I also happen to have written a parser library that derives parsers from the AST type itself. You might give it a try too.

elliottslaughter · September 12, 2022, 5:07pm

Thanks for this.

I ended up going back to serde and nom and managed to make them work together. It was mostly an issue of needing to understand the concrete types involved; when everything is so generic it can be a bit hard to tell what you're dealing with. That and I had to write the glue code to make the Error types place nicely.

system · December 11, 2022, 5:07pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
High-performance parsing (for what amounts to demarshalling)	6	777	September 19, 2022
Symmetrical parser for binary data help	6	370	May 29, 2023
Seeking code feedback: Tdms file parsing library help	5	598	September 9, 2019
Help me create this serde model help	5	454	January 22, 2021
Serde: custom variant tagging based on nested field help	4	434	December 12, 2022

Best way to implement a repetitive parser

Related Topics