Need help or idea of how to collect and store metadata for each binrw struct field

Hi everyone.

First of all I am looking for some idea for inspiration. But if you could provide some code or recipe this will be also very cool.

What I am working on: I want to replace my own de-/serialization proc-macro with binrw crate since the last is much better. This is my project and I am refactoring its UI part.
What I want to achieve: I need to collect metadata for each field of binrw struct, this metadata should contain offset from the beginning of the byte-array (which I use for serialization) and the actual size of the field (the amount of consumed bytes).
Why I need this: I am working on the component where I want to highlight the part of the packet hex string according to the field (something similar you can find in the wireshark dump output). So I need the data for proper calculation of which part of the hex string I need to highlight.
What kind of issue have I faced with: I have a macro where I want to implement the logic of collecting the metadata and providing some methods to each struct:

#[macro_export]
macro_rules! make_packet {
    (
        $(#[$attrs:meta])*
        struct $struct_name:ident {
            $(
                $(#[$field_attr:meta])*
                $field_name:ident: $field_type: ty,
            )*
        }
    ) => {
        mod packet {
            use std::io::Seek;
            use binrw::{binrw, BinRead, BinReaderExt, NullString};

            pub mod incoming {
                use super::*;

                $(#[$attrs])*
                pub struct $struct_name {
                    $(
                        $(#[$field_attr])*
                        pub $field_name: $field_type
                    ),*
                }

                #[derive(Debug, Default)]
                pub struct Metadata {
                    $(
                        pub $field_name: (usize, usize),
                    )*
                }

                impl $struct_name {
                    pub fn read_packet(data: &[u8]) -> anyhow::Result<(Self, Metadata)> {
                        let mut reader = std::io::Cursor::new(data);
                        let mut metadata = Metadata::default();
                        let instance = Self::read(&mut reader)?;

                        println!("R: {:?}", reader.reads);

                        // this not work because macros cannot read fields with #[br] attr
                        // $(
                        //     let start = reader.position();
                        //     let $field_name = <$field_type as BinRead>::read_le(&mut reader)?;
                        //     let end = reader.position();
                        //     metadata.$field_name = (start, end - start);
                        // )*

                        Ok((instance, metadata))
                    }
                }
            }
        }
    }
}

to calculate the offsets and sizes I try to use position() method, but when some of the fields of my struct contains #[br] attr this code will fail:

$(
    let start = reader.position();
    let $field_name = <$field_type as BinRead>::read_le(&mut reader)?;
    let end = reader.position();
    metadata.$field_name = (start, end - start);
)*

with error:

error[E0277]: the trait bound `VecArgs<()>: Required` is not satisfied
   --> src/main.rs:125:15
    |
125 |         text: Vec<u8>,
    |               ^^^^^^^ the trait `Default` is not implemented for `VecArgs<()>`

so seems like I should use the method like read_options or smth like that, but the question then: how can I parse the options from the token properly.

I also tried to implement the wrapper for the Cursor (in case I can store needed data on each read):

pub struct TrackedCursor<T> {
    inner: std::io::Cursor<T>,
    reads: Vec<(usize, usize)>, // (offset, length)
}

impl<T> TrackedCursor<T> {
    pub fn new(inner: T) -> Self
    where
        T: AsRef<[u8]>,
    {
        Self {
            inner: std::io::Cursor::new(inner),
            reads: Vec::new(),
        }
    }

    pub fn reads(&self) -> &[(usize, usize)] {
        &self.reads
    }

    pub fn into_inner(self) -> (T, Vec<(usize, usize)>) {
        (self.inner.into_inner(), self.reads)
    }
}

impl<T> Read for TrackedCursor<T>
where
    T: AsRef<[u8]>,
{
    fn read(&mut self, buf: &mut [u8]) -> Result<usize> {
        let offset = self.inner.position() as usize;
        let bytes_read = self.inner.read(buf)?;
        if bytes_read > 0 {
            self.reads.push((offset, bytes_read));
        }
        Ok(bytes_read)
    }
}

impl<T> Seek for TrackedCursor<T>
where
    T: AsRef<[u8]>,
{
    fn seek(&mut self, pos: SeekFrom) -> Result<u64> {
        self.inner.seek(pos)
    }
}

but this not work for me since it read smart-pointers/arrays and so on in different way:

  • NullString it reads byte-per-byte until zero
  • [u16; 4] it reads 4 times
  • etc

so the output of the reads field of the Wrapper for the struct below:

make_packet! {
    #[binrw]
    #[brw(little)]
    #[derive(Debug)]
    struct Message {
        len: u16,
        len2: [u16; 2],
        len3: NullString,
        #[br(count = len)]
        text: Vec<u8>,
    }
}

will be this:

[(0, 2), (2, 2), (4, 2), (6, 1), (7, 1), (8, 1), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (14, 1), (15, 1), (16, 1)]

which is actually not what I expect.

So my question is: what is the proper way to read and store metadata for each field ? For the struct from example I expect 4 metadata items which I prefer to be stored in the Metadata struct - if this even possible.

Could somebody advice me what can I do ?

P.S. not sure if parse_with attr can help me since each field in the struct can contains multiple attrs and I would like to avoid implementing custom parsers for every struct I have.

I haven't heard of binrw, so some of this might be inaccurate. I'm learning today!

BinRead::read_le() is not valid to call on Vec<_>. If it could be called, how would it get the value of the len field as specified by #[br(count = len)]?

You can call BinRead::read_options() instead (the derived BinRead is implemented in terms of this method). But that implies you provide the deserialized value of the len field to construct the Args associated type. And if you already have that information, there is no need to call the parser function to find out what it is.

Going over the docs, the PosValue type is almost what you want:

#[derive(BinRead)]
#[brw(little)]
#[derive(Debug)]
struct Message {
    len: PosValue<u16>,
    len2: PosValue<[u16; 2]>,
    len3: PosValue<NullString>,
    #[br(count = *len)]
    text: PosValue<Vec<u8>>,
}

There are a few downsides:

  • BinWrite is not implemented for it on the current crate release.
    • It's available in the GitHub repo with version 0.15.0-pre, you can use it with cargo patch.
  • The PosValue is unusually transparent. It derefs to T, and it won't even show itself with a Debug formatter.
    • See below for a workaround.
  • All of the field references to PosValues in attribute expressions need to use the .val field or deref operator.
  • It's really difficult to do the attribute transformation in a macro_rules macro. You might want to use a procedural macro if you would prefer a separate struct for tracking positions.

This looks like the proper way to do what you want. Wrap all the values in PosValue, and let the Deref[Mut] implementations forward to the underlying value. You can still probe the positions of individual fields.

I wrote a little wrapper to make Debug work correctly...

Maybe the maintainers will accept a patch to make this unnecessary?

use binrw::{binrw, BinRead, BinWrite, NullString, PosValue};
use std::{fmt, ops};

#[binrw] // Requires version 0.15.0-pre
#[brw(little)]
#[derive(Debug)]
struct Message {
    len: PosValueDebug<u16>,
    len2: PosValueDebug<[u16; 2]>,
    len3: PosValueDebug<NullString>,
    #[br(count = *len)]
    text: PosValueDebug<Vec<u8>>,
}

#[derive(Clone)]
struct PosValueDebug<T>(PosValue<T>);

impl<T> ops::Deref for PosValueDebug<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0.val
    }
}

impl<T> ops::DerefMut for PosValueDebug<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.0.val
    }
}

impl<U, T: PartialEq<U>> PartialEq<U> for PosValueDebug<T> {
    fn eq(&self, other: &U) -> bool {
        self.0.eq(other)
    }
}

impl<T: fmt::Debug> fmt::Debug for PosValueDebug<T> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("PosValueDebug")
            .field("val", &self.0.val)
            .field("pos", &self.0.pos)
            .finish()
    }
}

impl<T: BinRead> BinRead for PosValueDebug<T> {
    type Args<'a> = T::Args<'a>;

    fn read_options<R: std::io::Read + std::io::Seek>(
        reader: &mut R,
        endian: binrw::Endian,
        args: Self::Args<'_>,
    ) -> binrw::BinResult<Self> {
        let val = PosValue::<T>::read_options(reader, endian, args)?;

        Ok(Self(val))
    }
}

impl<T: BinWrite> BinWrite for PosValueDebug<T> {
    type Args<'a> = T::Args<'a>;

    fn write_options<W: std::io::Write + std::io::Seek>(
        &self,
        writer: &mut W,
        endian: binrw::Endian,
        args: Self::Args<'_>,
    ) -> binrw::BinResult<()> {
        self.0.write_options(writer, endian, args)
    }
}

fn main() {
    let data = [4, 0, 1, 0, 2, 0, 0, 42, 43, 44, 45];
    let mut reader = std::io::Cursor::new(data);
    let msg = Message::read(&mut reader).unwrap();
    dbg!(&msg);
}

Prints:

[src/main.rs:78:5] &msg = Message {
    len: PosValueDebug {
        val: 4,
        pos: 0,
    },
    len2: PosValueDebug {
        val: [
            1,
            2,
        ],
        pos: 2,
    },
    len3: PosValueDebug {
        val: NullString(""),
        pos: 6,
    },
    text: PosValueDebug {
        val: [
            42,
            43,
            44,
            45,
        ],
        pos: 7,
    },
}
1 Like

Nice pattern ! Thank you so much for your effort !

Finally I implemented the prototype of what I was looking for. Thanks @parasyte again !
I am sharing the full example with metadata in case this will be helpful for somebody:

use std::io::{Read, Seek};
use binrw::{BinRead, NamedArgs};

macro_rules! binrw_wrapper {
    (
    $(#[$struct_attrs:meta])*
    $struct_name:ident {
        $($(#[$field_attr:meta])*
        $field:ident: $ty:ty),* $(,)?
    }) => {
        mod packet {
            use binrw::{BinRead, NullString};
            use std::io::Cursor;
            use serde::Serialize;

            #[derive(BinRead, Debug, Clone)]
            #[br(little)]
            struct Temp {
                $(
                    $(#[$field_attr])*
                    pub $field: binrw::PosValue<$ty>,
                )*
            }

            #[derive(Default, Debug, Clone)]
            pub struct Metadata {
                $(
                    pub $field: (usize, usize),
                )*
            }

            $(#[$struct_attrs])*
            #[derive(Serialize)]
            pub struct $struct_name {
                $(
                    pub $field: $ty,
                )*
            }

            impl $struct_name {
                pub fn read_packet(data: &mut [u8]) -> anyhow::Result<(Self, Metadata)> {
                    let mut metadata = Metadata::default();

                    let mut reader = Cursor::new(data);
                    let temp: Temp = Temp::read(&mut reader)?;

                    let mut pairs: Vec<(usize, usize)> = {
                        let offsets: Vec<usize> = vec![$(temp.$field.pos as usize),*];

                        let mut sizes: Vec<usize> = offsets.windows(2)
                            .map(|pair| pair[1] - pair[0])
                            .collect();

                        sizes.push(reader.position() as usize - offsets.last().unwrap_or(&0));

                        offsets.iter().zip(sizes).map(|(o, s)| (*o, s)).collect()
                    };

                    let mut pairs = pairs.into_iter();
                    $(metadata.$field = pairs.next().unwrap();)*

                    let instance: Self = Self {
                        $(
                            $field: temp.$field.val,
                        )*
                    };

                    Ok((instance, metadata))
                }

                pub fn to_json(&mut self) -> anyhow::Result<String> {
                    serde_json::to_string(self).map_err(Into::into)
                }
            }
        }
    };
}

binrw_wrapper!(
    #[derive(Debug)]
    MyType {
        a: u8,
        // b: NullString,
        #[br(count = *a)]
        c: Vec<u8>,
        d: u64,
        e: u16,
        f: u64,
    }
);

fn main() -> anyhow::Result<()> {
    let mut data = vec![
        15, 2, 3, 10, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        1, 0, 0, 0, b'a', b'b', b'c', b'd', b'e', b'f', b'g', b'h', b'i', b'j',
        0, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5,
        0, 1, 2, 3, 4, 5, 1, 2, 3, 0, 2,
    ];

    let (val, meta) = packet::MyType::read_packet(&mut data)?;
    println!("VAL: {:?}", val);
    println!("MET: {:?}", meta);

    Ok(())
}

Some details: actually I need the binrw only for Temp struct, so my macro returns the usual Rust struct without binrw, or put it more simply INPUT eq OUTPUT. However I didn't test this with each binrw directive, so probably will need to change smth.

I believe, this code can be improved, so thanks in advance to everyone who can suggest the improvements.