High-performance parsing (for what amounts to demarshalling)

Is there a well-accepted set of practices for this in Rust? I mean, without just writing the lowest-level code ? I would like to write a parser for a textual wire-format (the "MatrixMarket") format for matrices, and would prefer not to write (or maintain) low-level code.

In the past, I've found that regular expressions can sometimes beat lex for this sort of thing, in OCaml. But sure not going to just presume that that's the case in Rust.

Use serde.

2 Likes

As mentioned, if you are serializing/deserializing a well-known format like JSON or bincode, check out the serde framework.

Otherwise, If you need to deserialize some custom format, your library choice will largely depend on the type of data being parsed.

I would normally use the nom crate (a parser combinator library) for parsing textual formats.

For binary formats, I would recommend the binread crate which gives you a really nice declarative API - the author has a great blog post breaking down how it works.

2 Likes

I agree with the replies above, I'll add that writing a deserializer for serde is not necessarily hard. And a lot of formats already have a serde implementation.

1 Like

Adding to this reply, you can also use nom to parse binary formats - fasterthanlime has an excellent series where he uses it to parse the ELF format.

1 Like

Oh yeah, you can definitely use nom for non-text inputs. I just feel like you get much cleaner code using something declarative like binread, because binary formats are often quite trivial (people often just memcpy() a C struct or two into a byte buffer and call it a day).

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.