Request feedback on an implementation to store "messages" in a "byte array batch" and retrieve them back

Hi there,

as I'm still in learning rust I'd like to get hints/feedback from the more advanced of you that will improve my coding :slight_smile:

Problem statement

I need to communicate with an external device that is able to accept a batch of tags that are stored consecutive in memory. Each tag size is a multiple of the size of u32. The tag types can differ so the can the length they occupy in the memory buffer.
I'd like to store different kind of those tags in such a "byte" buffer but also like to be able, after the external device has processed the data to retrieve the typed tag structures back from this batch memory buffer.

Solution proposal

With my current experience with Rust I came up with this implementation, that feels a bit clunky and I#d like to know if I could do better in regards of compile time checks over runtime checks.

Here the code (The definition of the PropertyTag trait is omitted as I thought it not of any relevance for this problem):

#[repr(C, align(16))]
pub struct MailboxBatch {
    pub(crate) buffer: Vec<u32>,
    pub(crate) tag_offsets: BTreeMap<TypeId, u32>,
}

impl MailboxBatch {
    pub fn empty() -> Self {
        MailboxBatch {
            // buffer always starts with 2 u32 values.
            // The first is the placeholder for the final batch message size and it starts with 12
            // containing the batch header(type+size each u32) + a closing u32
            // The second represent the message type
            buffer: vec![12, MessageState::Request as u32],
            tag_offsets: BTreeMap::new(),
        }
    }

    pub fn add_tag<T: PropertyTag + 'static>(&mut self, tag: T) -> MailboxResult<()> {
        if self.tag_offsets.contains_key(&TypeId::of::<T>()) {
            return Err("duplicate property tag in batch is not allowed");
        }
        // get the size of the tag to be added to the batch
        let tag_size = core::mem::size_of::<T>();
        // get the &[u32] representation of the property tag
        // this is save as every property tag need to be always a size that is a multiple of the 
        // size of an u32
        let slice =
            unsafe { core::slice::from_raw_parts(&tag as *const T as *const u32, tag_size >> 2) };
        // store the offset in the buffer this message is added to
        self.tag_offsets
            .insert(TypeId::of::<T>(), self.buffer.len() as u32);
        self.buffer.extend_from_slice(slice);
        self.buffer[0] += tag_size as u32;
        Ok(())
    }

    pub fn get_tag<T: PropertyTag + 'static>(&self) -> Option<&T> {
        // get the offset of this tag type if it is stored
        let offset = self.tag_offsets.get(&TypeId::of::<T>())?;
        // "cast" the buffer into the Tag structure
        let tag = unsafe {
            let ptr = &self.buffer[*offset as usize] as *const u32 as *const T;
            &*ptr as &T
        };

        Some(tag)
    }
}

With this I can do stuff like this:

fn main() {
  let batch = MailboxBatch::empty();
  let _ = batch.add_tag(FooTag::new());
  let _ = batch.add_tag(BarTag::new());
  let foo = batch.get_tag::<FooTag>().unwrap();
  let bar = batch.get_tag::<BarTag>().unwrap();
}

Question

So with this I'm wondering if I could get rid of at least one runtime check, when retrieving the PropertyTag of a specific type. Is there a way to use some sort of Type abstraction to extend the type of MailboxBatch to contain the type information from all added Tags so far? So that an abstract MailboxBatch may start as MailboxBatch<Empty> and evolves to a MailboxBatch<FooTag + BarTag>. And I can implement a get_footag and get_bartag function for the respective MailboxBatch types like:

impl MailboxBatch<FooTag> {
  pub fn get_footag() -> &FooTag {
  /* implementation ... */
  }
}

As I do have a finite number of possible tags this kind of implementation could be done using macros. But I'm still not sure how the type of MailboxBatch could "grow"...

The advantage from my point of view would be that the compiler could already complain that retrieval of batch.get_baztag() does not make sense as this tag has never been added to this instance.

I hope this all makes sense and thanks for reading till this point :smiley:
Any help/hint/guidance is very much appreciated..

Thanks in advance for your time....

Ah, so you want to dip your feet into type-level metaprogramming? If so, look into frunk. The cost is that you can't use references, something like this

edit: added comments here

If you want a deeper explanation look into this blog post from the people that made frunk, they have quite a few more that dig into the details of how frunk works!

Hi,
thanks for the links... so I need to dig into the reading of those helpful stuff. As I try to do the things on an embedded system there is no_std... Hopefully frunk can still support here. At the first look it's overwhelming and I really need some time to understand what is going on there :wink:

Thanks a ton for writing up the example and provide the comments to it! Much appreciated...

Since compiler won't be able to know about runtime data, I don't think you will be able to escape validation of tag presence in your struct during runtime. Either in your code or in a library that you are going to use. BTW, what you doing is pretty similar to AnyMap crate, if you look at their source maybe there will be some answers. I would just use their data structure and add serde serializers/deserializers on top.

Alternatively to your approach what you can do is turn your tags into enum, pack them into a stream-like and process data with a match statement one by one, though it might take more space due to enum's property.

Yes type level metaprogramming looks really confusing, the is largely due to the obtuse syntax required, and handling all edge cases up front. Fortunately, it does get better the more you work with it, although I'm not sure if I will completely understand the subtle transformations.

Hi @dunnock,

thanks for given me the direction towards AnyMap. However, as I'm targeting embedded system with no_std I thought Streams and serde would be no fit into this, but I might need to elaborate this as well. Thanks again :slight_smile:

Hi @KrishnaSannasi,

I guess I now have a glimps how your proposed implementation is working. At least how it grows the TagList and keeps track of the types. It's using some kind of builder pattern. I was really surprised that the underling memory consumption is really "packed" and reflect the sort of byte buffer already as I'd like to have it. Really neat.

However, the way how the determination/finding of the right Tag based on it's type works is still a bit foggy. Even though, I have not managed to access the Tag from the list if it appears more than once.

Please check this Playground where I trued different approaches but none worked, I guess because my lack of knowledge whats going on behind the scenes for this thing..

So I'm seeking a hint again :wink:
Thanks in advance

The second argument of find should have the form Here, Next<Here>, Next<Next<Here>>, ...

It represents an integer,

  • Here is 0, so 0 steps from the end of the list.
  • Next<N> represents N + 1

So, Next<Next<Here>> is 2 away from the end of the list

In this case, you can do

let foo_tag = mailbox.find::<FooTag, Next<Here>>();
let foo_tag = mailbox.find::<FooTag, Next<Next<Here>>>();

Because we are counting from the end of the list

[foo1, foo2, bar]
             ^ this is 0, and going backwards is
               adding a layer of `Next<_>`

Hey thanks, this makes totally sense...
Is it fine when I use your proposal with some tailored enhancements to my specific use case in a crate of mine?

This isn't my own, this is from frunk, that's licensed with MIT so it should be fine

Ok thanks, so I will put at least a reference to frunk than :slight_smile:

1 Like

I'm not sure but I think I wrote something "similar": https://crates.io/crates/linebuffer although I have a ring-buffer underneath it, so it's not an endless storage.

Hi,
thanks for suggestion of your crate. If I get this right it can store a bunch of data of the same type (eg. Byte Array) really conveniently, however one of my main requirements would be storage and retrieval of data of different types into the buffer without unsafe and type casting from/to byte buffers. But if I did not got this right and missed a feature of your crate I might give a second look at it :wink:

Otherwise Iโ€™m quite happy with the meta-type-recursion-proposal for the moment

Ah, yeah, then I've got this wrong.
It's just storing a tuple of bytes with an flag type you can define. So for example (String::from("hello").as_bytes(),1), you could use to specify the type of your data. But I think the other proposal is better suited for your case.