Design Guidance: Custom Binary File Reader

Hi there,

I've spend the past week or so playing with Rust and reading the book. I really like the ideals behind the language and as such I want to dig deeper. To do so I think I need a real project, one which seems like a good technical level is a binary file parser.

I have previously lived in OOP environments, with dynamic dispatch + inheritance. My struggle is around design considerations when using rust - I can't get my head around what to do when we don't know which order things are going to happen in :smile:

This isn't really a question, I'm just stuck and looking to stimulate discussion!

Project 'Spec'

The target is STDF (structured/standard test data format) - there is more details available here, however it is essentially a record-based file structure. Each record consisting of several fields which are an ordered list of a set of types. The beginning of an example record might be:

| FIELD   | TYPE | NOTE                   |
|---------|------|------------------------|
| REC_TYP | U16  | Type of record         |
| REC_SUB | U16  | Sub-type of record     |
| REC_LEN | U32  | Number bytes in record |
| LOT_ID  | Cn   | ...                    |

So each record starts with an indication of the type (REC_TYP + REC_SUB) and the record size (REC_LEN) - it then proceeds with the binary data laid out as per the specification.

From a parsing perspective, the records can be in any order and we do not know ahead of time how many records of each type there may be.

Considerations (from me)

  • I could define Structs for each type (Cn, U32, ...) which implement a Read/Write trait

  • I could define the Record Struct with a Vec of 'types'

    • To read a record I could cycle through the Vec and do the read/write
  • I could alternatively have Struct for each REC_TYP, REC_SUB combination (these are called MIR, PIR, PRR, etc.)

    • Each struct would implement a read/write trait which would do the read/write operation on each element of the struct
    • This seems like a lot of repetition
  • Generally trying to think of a top-level API for a library, which will be useful for if someone wants to implement an application to; convert to something human readable (CSV, ...), or pull certain aspects into a database, or create a wrapper to something like pandas in Python.

Struggles

  • What do I store the output as without knowledge of record types ahead of time?

    • A common use case might;
      • Parse the file
      • Get all the serial numbers from PRR records
      • Get all the data from PIR records
  • My head hurts :wink: