Hello All,
Was hoping I could get some suggestions on how to approach this problem of parsing bytestring records.
My background -- I've written bits of scripts here and there but nothing major, and nothing in C/C++, so please pardon my ignorance/partial knowledge, I've only just begun reading through the Rust book (currently at chapter 9).
So... I'm looking to parse a few different record structures, all fully-defined/structured.
It's best to say that they're all byte string records of varying lengths.
For example, there are these kinds of 'values' in each record: big/little endian bytes which I an convert to a) int/uint b) some hex and c) strings, but in a different codepage (not ascii).
Among the different record structures, most common ones are -
header-section, section1(x, y, z), section2(a, b, c), ... sectionN(p, q, r), (offset, length, count)*n
The record structure is fully documented, so I know exactly what & where to parse, there's no question of 'is it this kind or that kind?'.
the 'header-section' contains a few bytes, some of which identifies the record type and, if applicable, record subtype - of the whole record.
x, y, z are bytes strings like I've mentioned above.
(offset, length, count) - I reckon they're referred to as triplets, where
'offset' points to the location in the bytestring where a particular section starts
'length' points to the length of data at that offset
'count' points to the number of such sections
Some info on the 'header' - https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.ieag200/smfhdr.htm
Example of (offset, length, count) - https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.halx001/tcpconn.htm
Based on what I've read so far, I've gotten this much:
- use from_be_bytes or from_le_bytes to parse bytes to numerics
- use structs to define 'section1', 'section2', etc., within which I declare each variable's type (int/float/whatever)
- use hash maps to hold multiple structs thereby making the whole record
My questions are -
What would be the best way to read the bytestrings and
a) parse the whole bytestring into a hashmap of structs (or whatever else is more appropriate), based on the value (record type & subtype) I find in the first few bytes (location is known). Would I have to read each record twice? First time, only a few bytes at the start and then pass the bytestring onto another function that then parses the whole thing?
b) define structs/enums/hashmaps where there is variation in the structure/repetition in some sections of the bytestring (as described above)
Thanks for reading, and thanks for any and all suggestions.
Since I'm a beginner in hardcore programming, and at Rust, any code samples will be super useful to wrap my header around it.
Thanks again!