Before starting procedural macros, need advice

Hi,
The end goal is to serialize this struct, ala rkyv or serde.
I was reading Procedural Macros - The Rust Reference

#[repr(C)]
struct A{
     element: Vec<u32>,
     sentence: String,
     another: f64
}

into its serialized variant

struct ArchivedA{
     element: [u32;Y],
     sentence: [char;X],
     another: [u8;Z] // retrieve using from_bits to_bits
}

with X, Y and Z the dynamic size of the elements with eventual padding calculated.

Is the syn crate the easiest route to parse struct A and transform it to ArchivedA?
The goal is to store A to ArchivedA to store the data on disk, send over the network..

This is an exercise to better understand the inner mechanism of rust.

I use nightly.

Thanks!

Procedural derive macros are evaluated at compile time, taking the token stream of the item they are derived from as their input. The question I'd ask myself when evaluating whether a derive macro would be the right solution for my problem would be, what do I want X, Y and Z to evaluate to, if all I have is the definition of A as my input?

3 Likes

I would need to:

  1. Parse the A struct to know its name ( and add a prefix "Archived" to it), what it is made of: name and type (if non primitive, take its actual length in memory)
  2. Pre allocate some space accordingly: if primitive variant, store it directly in the struct, otherwise, for the non primitive, copy the data to a buffer and register the offset from the head of the buffer + actual length. Loop and possibly align each types. Maybe an enum to help:
enum DataType{
  IsU8(u8),
  IsU16(u16),
  ...
  IsString32_32({start_offset: u32, length: u32), // relative offset from head of archive + word size, length 
  IsString32_64({start_offset: u32, length: u64}), // relative offset from head of archive + word_size, length 
  ...
  IsVec<u8>({start_offset: u32, length: u64}) 
}
  1. Concatenate the struct buffer with the non primitive buffer. Now that all the data is known, update the relative offsets. Reserve one word at the beginning of the archive to indicate the size of the archive in bytes. My use case is always < 4GB but an alt implementation can deal with that.

  2. Store the result in an aligned Vec. Done.

So I need some sort of reflection if I want to avoid macros. But in the meantime, the syn gives me the ast of the structure. I am not sure how to deal with nested structs types yet. But recursion look inevitable.

Going further in the analysis, the crate quote will be a good companion as well.

A couple of things to keep in mind with macros is that they:

  1. can't resolve and inspect types,
  2. don't have any runtime information.

You will pretty much just have a more structured version of the source text. To figure out if something is a "primitive", nested struct, heap allocated or not, etc., requires information from a later stage of compilation. You can of course guess from the names.

You will most likely need a companion trait that does the "actual" work of defining what is what. This would be similar to the Serialize and Deserialize traits from serde.

trait Archive {
    type Archived;

    fn archive(self, dynamic_data: &mut ArchiveBuffer) -> Self::Archived;
}

struct ArchivedA {
    thing: <MyType as Archive>::Archived,
}

...or whatever works best for you. I usually find it helpful to first implement what I want the output to be, by hand, and then figure out a type agnostic pattern from there. Starting with the result and moving backwards.

2 Likes

Yes macro are compile time so obviously, no way to know the runtime size.

I will dig into how others do.

Thanks guys!

@jofas @ogeon

Now that I have a small working example, I understand how wrong I was.

Thanks you guys for putting me on the right track :+1:

2 Likes

Guys there is something else:

Now that my serialization works fine, I still have an issue..

I have a Serialization trait that I implemented for many types.
It runs at runtime, give me the proper types for every fields and I am able to serialize the data accordingly in the final buffer. Runs fine.

The difficulty is when reading the archive after the serialization.

Let's take this struct:

struct A{
   field0: String,
   field1: u32
}

My ideal logic would be to generate at compile time

// offset in the current received slice
// Block returns an &str.
struct StringBlock{ 
     offset: usize,
     len : usize
}
impl StringBlock{
...
}
struct ArchivedA{
   field0: StringBlock,
   field1: u32
}

But it's not possible because I refuse to create the logic based on field names ( what about aliases (which will often not be mine) and other funky things: I leave that to the compiler, this is insane).

So, I can't pre generate this schema at compile time; I am not talking about writing by hand for a specific struct.
I want to be able to generalize and later use syn and quote for this.

Creating an Archive trait enables to go from original trait to my custom structs. One at a time.
I think I don't get how it can work.

How can I access my field like

let my_a: &ArchivedA = ...
println!("{} {}", my_a.field0, my_a. field1);

field0 in ArchivedA may not occupy as much space as field0 in A so I would have to calculate at runtime the position in the buffer. Doesn't look practical.

So I am stuck at this point.

Again, I do this as an exercise to increase my understanding of the language. Reading through serde and rkyv source code doesn't help, the code is very long and kind of difficult to follow for me right now. Didn't find a really simple rust implementation for beginners.

Thanks a lot for any help you can give me!

Properly implementing the archive trait solved everything :slight_smile:

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.