Suggestions on approach

Hello All,

Was hoping I could get some suggestions on how to approach this problem of parsing bytestring records.

My background -- I've written bits of scripts here and there but nothing major, and nothing in C/C++, so please pardon my ignorance/partial knowledge, I've only just begun reading through the Rust book (currently at chapter 9).

So... I'm looking to parse a few different record structures, all fully-defined/structured.
It's best to say that they're all byte string records of varying lengths.
For example, there are these kinds of 'values' in each record: big/little endian bytes which I an convert to a) int/uint b) some hex and c) strings, but in a different codepage (not ascii).

Among the different record structures, most common ones are -
header-section, section1(x, y, z), section2(a, b, c), ... sectionN(p, q, r), (offset, length, count)*n
The record structure is fully documented, so I know exactly what & where to parse, there's no question of 'is it this kind or that kind?'.
the 'header-section' contains a few bytes, some of which identifies the record type and, if applicable, record subtype - of the whole record.
x, y, z are bytes strings like I've mentioned above.
(offset, length, count) - I reckon they're referred to as triplets, where
'offset' points to the location in the bytestring where a particular section starts
'length' points to the length of data at that offset
'count' points to the number of such sections

Some info on the 'header' - https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.ieag200/smfhdr.htm

Example of (offset, length, count) - https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.halx001/tcpconn.htm

Based on what I've read so far, I've gotten this much:

  • use from_be_bytes or from_le_bytes to parse bytes to numerics
  • use structs to define 'section1', 'section2', etc., within which I declare each variable's type (int/float/whatever)
  • use hash maps to hold multiple structs thereby making the whole record

My questions are -
What would be the best way to read the bytestrings and
a) parse the whole bytestring into a hashmap of structs (or whatever else is more appropriate), based on the value (record type & subtype) I find in the first few bytes (location is known). Would I have to read each record twice? First time, only a few bytes at the start and then pass the bytestring onto another function that then parses the whole thing?
b) define structs/enums/hashmaps where there is variation in the structure/repetition in some sections of the bytestring (as described above)

Thanks for reading, and thanks for any and all suggestions.

Since I'm a beginner in hardcore programming, and at Rust, any code samples will be super useful to wrap my header around it.

Thanks again!

To represent the various types of records, you probably want an enum.

enum Record {
    FooRecord { fields... },
    BarRecord { fields... },
}

As for parsing each kind of record, you might want to try out the byteorder crate instead of using from_be_bytes directly, although both are good ways of doing it. Regarding doing different things depending on record type, I'd first read the identification header, and then I'd have a big match on the kind of record and have a function per record you call from the match.

Thank you Alice.
I'm literally a noob at programming, especially Rust.
I understand that we can use enums within which each variable can be different type.
If you don't mind, could you show me an example of parsing Table 1 from here, for example:https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.ieag200/smfhdr.htm

Are you saying that I can hold the whole record in an enum, rather than a hash map of structs?

Regarding the hash map, that depends on the details. You should use a hash map if you have a variable number of keys you don't know ahead of time, and a struct if you know all the fields when writing the code.

Here's an example using the byteorder crate. I guessed that it was big endian.

use std::io::{BufRead, Result};
use byteorder::{BigEndian, ReadBytesExt};

struct Table1Record {
    len: u16,
    seg: u16,
    flg: u8,
    rty: u8,
    tme: u32,
    dte: u32,
    sid: u32,
}

impl Table1Record {
    /// Read from the provided buffered input stream.
    pub fn read_from<R: BufRead>(reader: &mut R) -> Result<Table1Record> {
        let len = reader.read_u16::<BigEndian>()?;
        let seg = reader.read_u16::<BigEndian>()?;
        let flg = reader.read_u8()?;
        let rty = reader.read_u8()?;
        let tme = reader.read_u32::<BigEndian>()?;
        let dte = reader.read_u32::<BigEndian>()?;
        let sid = reader.read_u32::<BigEndian>()?;
        
        Ok(Table1Record {
            len, seg, flg, rty, tme, dte, sid,
        })
    }
}

Note that it often makes sense to make a struct like the above for each table, and you enum can then look like this:

enum Record {
    FooRecord(FooRecordStruct),
    BarRecord(BarRecordStruct),
}
1 Like

Ok, so if I build out multiple structs, and put them all in an enum, to represent 1 record...
can I then use a single function to say, parse the whole record 'into' this enum?
Rather than implementing read_from for/within each struct. How would I implement that?

Re - hash map, I have a known number of keys, but some sections in the record have varying lengths (offset, length, count - and count here decides how many segments that section has).

If you have variable length data, a Vec or HashMap can indeed make sense depending on the shape of your data. I'm imagining something like this:

struct Record {
    header: Header,
    body: RecordBody,
}
enum RecordBody {
    FooRecord(FooRecordStruct),
    BarRecord(BarRecordStruct),
}

impl Record {
    pub fn read_from<R: BufRead>(reader: &mut R) -> Result<Record> {
        let header = Header::read_from(reader)?;
        let body = match header.kind {
            0 => RecordBody::FooRecord(FooRecordStruct::read_from(reader)?),
            1 => RecordBody::BarRecord(BarRecordStruct::read_from(reader)?),
            kind => return Err(Error::new(ErrorKind::InvalidData, format!("Invalid kind: {}", kind)))
        };
        Ok(Record {
            header,
            body,
        })
    }
}

see full example

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.