Symmetrical parser for binary data

What approaches can I use to implement a declarative and symmetrical parser and builder for binary data comparable with:

What I want is some tool like lalrpop which allows to declare binary grammar for data formats, including bit fields and multiple byte order, the creates code both for parsing binary slices in memory (embedded-ready), and build ones from structured data (dynamic maps/vectors, arrays, structs etc).

nom is very popular, though I have no experience using it.

If you want control of the mapping at the primitive level, you basically want to implement a serde format. This isn't particularly easy and could be documented better, but it's probably the simplest option I've seen for this.

The more flexible option is to skip what serde would be doing for you and implement a proc macro that parses a structure definition and generates the exact code you want. This is generally in the "less difficult than you expect, more difficult than you want" category, so check other options first.

1 Like

I have used nom before. Functional parsing can be very powerful, but also fairly confusing to newcomers — especially considering the pages-long error messages when you accidentally forget calling a function (as functions are used as argument types, every type mismatch is actually a rather complex trait resolution failure). The resulting code is very readable and precise, but it takes some time to understand how to write it.

3 Likes

More relevantly, it's parsing only. OP wants a single definition that can emit both serialization and deserialization code.

1 Like

It is definitely possible to create matching parsers and printers mechanistically.

I have somewhat recently released a parser-and-printer crate called Parsel. Similar to Serde, it uses derive proc-macros to automatically generate parsers and printers in a purely declarative manner, directly from the structure of the AST node types. The key observation is that structs and tuples correspond to sequencing and enums correspond to alternation.

It also separately provides a set of generic helper types for acomplishing common parsing tasks, such as separated repetition, parenthesized subexpressions, and optional productions.

Unfortunately, Parsel was specifically designed to parse human-readable languages. (I'm using it myself for parsing an ORM DSL that looks like a mix of SQL and Rust.) It only works over strings parseable into TokenStreams, not over arbitrary binary data.

You could, however, certainly copy some of its techniques and insights to produce a parsing library that works with all kinds of raw bitstreams just as well.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.