Rust Proc Macro Question - How to parse nested struct member variables?

I am attempting to write a macro that will generate a telemetry function for any struct by using
#[derive(telemetry)] . This function will send a data stream to anything provided that is io::Writable. This data stream will be "self describing" such that the receiver doesn't need to know anything else about the data other than the bytes received. This allows the receiver to be able to correctly parse a struct and print its member variables names and values, even if variables are added, removed, order changed, or variable names renamed. The telemetry function works for non-nested structs and will print the name and type of a nested struct. But I need it to recursively print the names, types, sizes, and values of the nested structs member variables. An example is shown below as is the code.

Current behavior

use derive_telemetry::Telemetry;
use std::fs::File;
use std::io::{Write};
use std::time::{Duration, Instant};
use std::marker::Send;
use std::sync::{Arc, Mutex};
use serde::{Deserialize, Serialize};

#[repr(C, packed)]
#[derive(Debug, Serialize, Deserialize, Copy, Clone)]
pub struct AnotherCustomStruct {
    pub my_var_2: f64,
    pub my_var_1: f32,
}

#[derive(Telemetry)]
#[derive(Debug, Serialize, Deserialize)]
struct TestStruct {
    pub a: u32,
    pub b: u32,
    pub my_custom_struct: AnotherCustomStruct,
    pub my_array: [u32; 10],
    pub my_vec: Vec::<u64>,
    pub another_variable : String,
}

const HEADER_FILENAME: &str = "test_file_stream.header";
const DATA_FILENAME: &str = "test_file_stream.data";

fn main() -> Result<(), Box<dyn std::error::Error>>{
let my_struct = TestStruct { a: 10,
                                 b: 11,
                                 my_custom_struct: AnotherCustomStruct { my_var_1: 123.456, my_var_2: 789.1023 },
                                 my_array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                                 my_vec: vec![11, 12, 13],
                                 another_variable: "Hello".to_string()
                                };
let file_header_stream = Mutex::new(Box::new(File::create(HEADER_FILENAME)?) as Box <dyn std::io::Write + Send + Sync>);
let file_data_stream = Mutex::new(Box::new(File::create(DATA_FILENAME)?) as Box <dyn std::io::Write + Send + Sync>);
    my_struct.telemetry(Arc::new(file_header_stream), Arc::new(file_data_stream));
let header: TelemetryHeader = bincode::deserialize_from(&File::open(HEADER_FILENAME)?)?;
    let data: TestStruct = bincode::deserialize_from(&File::open(DATA_FILENAME)?)?;
    println!("{:#?}", header);
    println!("{:?}", data);
Ok(())
}

produces

TelemetryHeader {
    variable_descriptions: [
        VariableDescription {
            var_name_length: 1,
            var_name: "a",
            var_type_length: 3,
            var_type: "u32",
            var_size: 4,
        },
        VariableDescription {
            var_name_length: 1,
            var_name: "b",
            var_type_length: 3,
            var_type: "u32",
            var_size: 4,
        },
        VariableDescription {
            var_name_length: 16,
            var_name: "my_custom_struct",
            var_type_length: 19,
            var_type: "AnotherCustomStruct",
            var_size: 12,
        },
        VariableDescription {
            var_name_length: 8,
            var_name: "my_array",
            var_type_length: 10,
            var_type: "[u32 ; 10]",
            var_size: 40,
        },
        VariableDescription {
            var_name_length: 6,
            var_name: "my_vec",
            var_type_length: 14,
            var_type: "Vec :: < u64 >",
            var_size: 24,
        },
        VariableDescription {
            var_name_length: 16,
            var_name: "another_variable",
            var_type_length: 6,
            var_type: "String",
            var_size: 24,
        },
    ],
}
TestStruct { a: 10, b: 11, my_custom_struct: AnotherCustomStruct { my_var_2: 789.1023, my_var_1: 123.456 }, my_array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], my_vec: [11, 12, 13], another_variable: "Hello" } 

The data format is length of variable name, variable name, length of variable type, variable type, variable num of bytes.

Required Behavior

TelemetryHeader {
    variable_descriptions: [
        VariableDescription {
            var_name_length: 1,
            var_name: "a",
            var_type_length: 3,
            var_type: "u32",
            var_size: 4,
        },
        VariableDescription {
            var_name_length: 1,
            var_name: "b",
            var_type_length: 3,
            var_type: "u32",
            var_size: 4,
        },
        VariableDescription {
            var_name_length: 16,
            var_name: "my_custom_struct",
            var_type_length: 19,
            var_type: "AnotherCustomStruct",
            var_size: 12,
        },
        VariableDescription {
            var_name_length: 8,
            var_name: "my_var_2",
            var_type_length: 3,
            var_type: "f64",
            var_size: 8,
        },
        VariableDescription {
            var_name_length: 8,
            var_name: "my_var_1",
            var_type_length: 3,
            var_type: "f32",
            var_size: 4,
        },
        VariableDescription {
            var_name_length: 8,
            var_name: "my_array",
            var_type_length: 10,
            var_type: "[u32 ; 10]",
            var_size: 40,
        },
        VariableDescription {
            var_name_length: 6,
            var_name: "my_vec",
            var_type_length: 14,
            var_type: "Vec :: < u64 >",
            var_size: 24,
        },
        VariableDescription {
            var_name_length: 16,
            var_name: "another_variable",
            var_type_length: 6,
            var_type: "String",
            var_size: 24,
        },
    ],
}
TestStruct { a: 10, b: 11, my_custom_struct: AnotherCustomStruct { my_var_2: 789.1023, my_var_1: 123.456 }, my_array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], my_vec: [11, 12, 13], another_variable: "Hello" } 

To reiterate, the current behavior correctly prints the variable name, type, size, and value for a struct with the Telemetry trait derived. It prints the name, type, and size for a nested struct correctly, but it does not then print the names and values of the nested structs members which is required. The code is below, apologies for such a long post, I hope this is formatted well and clear, thank you in advance.

Directory Structure

src
 -main.rs
telemetry
 -Cargo.toml
 -src
   --lib.rs
Cargo.toml

main.rs

use derive_telemetry::Telemetry;
use std::fs::File;
use std::io::{Write};
use std::time::{Duration, Instant};
use std::marker::Send;
use std::sync::{Arc, Mutex};
use serde::{Deserialize, Serialize};

const HEADER_FILENAME: &str = "test_file_stream.header";
const DATA_FILENAME: &str = "test_file_stream.data";

#[repr(C, packed)]
#[derive(Debug, Serialize, Deserialize, Copy, Clone)]
pub struct AnotherCustomStruct {
    pub my_var_2: f64,
    pub my_var_1: f32,
}

#[derive(Telemetry)]
#[derive(Debug, Serialize, Deserialize)]
struct TestStruct {
    pub a: u32,
    pub b: u32,
    pub my_custom_struct: AnotherCustomStruct,
    pub my_array: [u32; 10],
    pub my_vec: Vec::<u64>,
    pub another_variable : String,
}
fn main() -> Result<(), Box<dyn std::error::Error>>{
    let my_struct = TestStruct { a: 10,
                                 b: 11,
                                 my_custom_struct: AnotherCustomStruct { my_var_1: 123.456, my_var_2: 789.1023 },
                                 my_array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                                 my_vec: vec![11, 12, 13],
                                 another_variable: "Hello".to_string()
                                };
    let file_header_stream = Mutex::new(Box::new(File::create(HEADER_FILENAME)?) as Box <dyn std::io::Write + Send + Sync>);
    let file_data_stream = Mutex::new(Box::new(File::create(DATA_FILENAME)?) as Box <dyn std::io::Write + Send + Sync>);

    //let stdout_header_stream = Mutex::new(Box::new(io::stdout()) as Box <dyn std::io::Write + Send + Sync>);
    //let stdout_data_stream = Mutex::new(Box::new(io::stdout()) as Box <dyn std::io::Write + Send + Sync>);

    //let tcp_header_stream = Mutex::new(Box::new(TCPStream::connect(127.0.0.1)?) as Box <dyn std::io::Write + Send + Sync>);
    //let tcp_data_stream = Mutex::new(Box::new(TCPStream::connect(127.0.0.1)?) as Box <dyn std::io::Write + Send + Sync>);

    //let test_traits = Mutex::new(Box::new(io::stdout()) as Box <dyn std::io::Write + Send + Sync>);
    let start = Instant::now();
    my_struct.telemetry(Arc::new(file_header_stream), Arc::new(file_data_stream));
    let duration = start.elapsed();
    println!("Telemetry took: {:?}", duration);
    thread::sleep(Duration::from_secs(1));
    let header: TelemetryHeader = bincode::deserialize_from(&File::open(HEADER_FILENAME)?)?;
    let data: TestStruct = bincode::deserialize_from(&File::open(DATA_FILENAME)?)?;
    println!("{:#?}", header);
    println!("{:?}", data);
    Ok(())
}

main Cargo.toml

[package]
name = "proc_macro_test"
version = "0.1.0"
edition = "2018"

[workspace]
members = [
    "telemetry",
]

[dependencies]
derive_telemetry = { path = "telemetry" }
ndarray = "0.15.3"
crossbeam = "*"
serde = { version = "*", features=["derive"]}
bincode = "*"

[profile.dev]
opt-level = 0

[profile.release]
opt-level = 3

telemetry lib.rs

use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, parse_quote, DeriveInput};

#[proc_macro_derive(Telemetry)]
pub fn derive(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input as DeriveInput);
    let output = parse_derive_input(&input);
    match output {
        syn::Result::Ok(tt) => tt,
        syn::Result::Err(err) => err.to_compile_error(),
    }
    .into()
}

fn parse_derive_input(input: &syn::DeriveInput) -> syn::Result<proc_macro2::TokenStream> {

    let struct_ident = &input.ident;
    let struct_data = parse_data(&input.data)?;
    let struct_fields = &struct_data.fields;
    let generics = add_debug_bound(struct_fields, input.generics.clone());
    let (impl_generics, ty_generics, where_clause) = generics.split_for_impl();

    let _struct_ident_str = format!("{}", struct_ident);
    let tele_body = match struct_fields {
            syn::Fields::Named(fields_named) => handle_named_fields(fields_named)?,
            syn::Fields::Unnamed(fields_unnamed) => {
                let field_indexes = (0..fields_unnamed.unnamed.len()).map(syn::Index::from);
                let field_indexes_str = (0..fields_unnamed.unnamed.len()).map(|idx| format!("{}", idx));
                quote!(#( .field(#field_indexes_str, &self.#field_indexes) )*)
            }
            syn::Fields::Unit => quote!(),
        };

    let telemetry_declaration = quote!(
        trait Telemetry {
            fn telemetry(self, header_stream: Arc<Mutex::<Box <std::io::Write + std::marker::Send + Sync>>>, data_stream: Arc<Mutex::<Box <std::io::Write + std::marker::Send + Sync>>>);
        }
    );



    syn::Result::Ok(
        quote!(
            use std::thread;
            use std::collections::VecDeque;


            #[derive(Serialize, Deserialize, Default, Debug)]
            pub struct VariableDescription {
                pub var_name_length: usize,
                pub var_name: String,
                pub var_type_length: usize,
                pub var_type: String,
                pub var_size: usize,
        }

        #[derive(Serialize, Deserialize, Default, Debug)]
        pub struct TelemetryHeader {
            pub variable_descriptions: VecDeque::<VariableDescription>,
        }


        #telemetry_declaration
        impl #impl_generics Telemetry for #struct_ident #ty_generics #where_clause {
            fn telemetry(self, header_stream: Arc<Mutex::<Box <std::io::Write + std::marker::Send + Sync>>>, data_stream: Arc<Mutex::<Box <std::io::Write + std::marker::Send + Sync>>>) {
                        thread::spawn(move || {
                             #tele_body;
                        });
            }
        }
    )
    )
}

fn handle_named_fields(fields: &syn::FieldsNamed) -> syn::Result<proc_macro2::TokenStream> {
    let idents = fields.named.iter().map(|f| &f.ident);
    let types = fields.named.iter().map(|f| &f.ty);
    let num_entities = fields.named.len();
    let test = quote! (
                let mut tele_header = TelemetryHeader {variable_descriptions: VecDeque::with_capacity(#num_entities)};
                #(
                    tele_header.variable_descriptions.push_back( VariableDescription {
                        var_name_length: stringify!(#idents).len(),
                        var_name: stringify!(#idents).to_string(),
                        var_type_length: stringify!(#types).len(),
                        var_type: stringify!(#types).to_string(),
                        var_size: std::mem::size_of_val(&self.#idents),
                    });

                )*
                header_stream.lock().unwrap().write(&bincode::serialize(&tele_header).unwrap()).unwrap();
                data_stream.lock().unwrap().write(&bincode::serialize(&self).unwrap()).unwrap();
        );
        syn::Result::Ok(test)
}

fn parse_named_field(field: &syn::Field) -> proc_macro2::TokenStream {
    let ident = field.ident.as_ref().unwrap();
    let ident_str = format!("{}", ident);
    let ident_type = &field.ty;
    if field.attrs.is_empty() {

                quote!(
                println!("Var Name Length: {}", stringify!(#ident_str).len());
                println!("Var Name: {}", #ident_str);
                println!("Var Type Length: {}", stringify!(#ident_type).len());
                println!("Var Type: {}", stringify!(#ident_type));
                println!("Var Val: {}", &self.#ident);
            )
    }
    else {
        //parse_named_field_attrs(field)
        quote!()
    }
}


fn parse_named_field_attrs(field: &syn::Field) -> syn::Result<proc_macro2::TokenStream> {
    let ident = field.ident.as_ref().unwrap();
    let ident_str = format!("{}", ident);
    let attr = field.attrs.last().unwrap();
    if !attr.path.is_ident("debug") {
        return syn::Result::Err(syn::Error::new_spanned(
            &attr.path,
            "value must be \"debug\"",
        ));
    }
    let attr_meta = &attr.parse_meta();
    match attr_meta {
        Ok(syn::Meta::NameValue(syn::MetaNameValue { lit, .. })) => {
            let debug_assign_value = lit;
            syn::Result::Ok(quote!(
                .field(#ident_str, &std::format_args!(#debug_assign_value, &self.#ident))
            ))
        }
        Ok(meta) => syn::Result::Err(syn::Error::new_spanned(meta, "expected meta name value")),
        Err(err) => syn::Result::Err(err.clone()),
    }
}

fn parse_data(data: &syn::Data) -> syn::Result<&syn::DataStruct> {
    match data {
        syn::Data::Struct(data_struct) => syn::Result::Ok(data_struct),
        syn::Data::Enum(syn::DataEnum { enum_token, .. }) => syn::Result::Err(
            syn::Error::new_spanned(enum_token, "CustomDebug is not implemented for enums"),
        ),
        syn::Data::Union(syn::DataUnion { union_token, .. }) => syn::Result::Err(
            syn::Error::new_spanned(union_token, "CustomDebug is not implemented for unions"),
        ),
    }
}

fn add_debug_bound(fields: &syn::Fields, mut generics: syn::Generics) -> syn::Generics {
    let mut phantom_ty_idents = std::collections::HashSet::new();
    let mut non_phantom_ty_idents = std::collections::HashSet::new();
    let g = generics.clone();
    for (ident, opt_iter) in fields
        .iter()
        .flat_map(extract_ty_path)
        .map(|path| extract_ty_idents(path, g.params.iter().flat_map(|p| {
            if let syn::GenericParam::Type(ty) = p {
                std::option::Option::Some(&ty.ident)
            } else {
                std::option::Option::None
            }
        } ).collect()))
    {
        if ident == "PhantomData" {
            // If the field type ident is `PhantomData`
            // add the generic parameters into the phantom idents collection
            if let std::option::Option::Some(args) = opt_iter {
                for arg in args {
                    phantom_ty_idents.insert(arg);
                }
            }
        } else {
            // Else, add the type and existing generic parameters into the non-phantom idents collection
            non_phantom_ty_idents.insert(ident);
            if let std::option::Option::Some(args) = opt_iter {
                for arg in args {
                    non_phantom_ty_idents.insert(arg);
                }
            }
        }
    }
    // Find the difference between the phantom idents and non-phantom idents
    // Collect them into an hash set for O(1) lookup
    let non_debug_fields = phantom_ty_idents
        .difference(&non_phantom_ty_idents)
        .collect::<std::collections::HashSet<_>>();
    // Iterate generic params and if their ident is NOT in the phantom fields
    // do not add the generic bound
    for param in generics.type_params_mut() {
        // this is kinda shady, hoping it works
        if !non_debug_fields.contains(&&param.ident) {
            param.bounds.push(parse_quote!(std::fmt::Debug));
        }
    }
    generics
}

/// Extract the path from the type path in a field.
fn extract_ty_path(field: &syn::Field) -> std::option::Option<&syn::Path> {
    if let syn::Type::Path(syn::TypePath { path, .. }) = &field.ty {
        std::option::Option::Some(&path)
    } else {
        std::option::Option::None
    }
}

/// From a `syn::Path` extract both the type ident and an iterator over generic type arguments.
fn extract_ty_idents<'a>(
    path: &'a syn::Path,
    params: std::collections::HashSet<&'a syn::Ident>,
) -> (
    &'a syn::Ident,
    std::option::Option<impl Iterator<Item = &'a syn::Ident>>,
) {
    let ty_segment = path.segments.last().unwrap();
    let ty_ident = &ty_segment.ident;
    if let syn::PathArguments::AngleBracketed(syn::AngleBracketedGenericArguments {
        args, ..
    }) = &ty_segment.arguments
    {
        let ident_iter = args.iter().flat_map(move |gen_arg| {
            if let syn::GenericArgument::Type(syn::Type::Path(syn::TypePath { path, .. })) = gen_arg
            {
                match path.segments.len() {
                    2 => {
                        let ty = path.segments.first().unwrap();
                        let assoc_ty = path.segments.last().unwrap();
                        if params.contains(&ty.ident) {
                            std::option::Option::Some(&assoc_ty.ident)
                        } else {
                            std::option::Option::None
                        }
                    }
                    1 => path.get_ident(),
                    _ => std::unimplemented!("kinda tired of edge cases"),
                }
            } else {
                std::option::Option::None
            }
        });
        (ty_ident, std::option::Option::Some(ident_iter))
    } else {
        (ty_ident, std::option::Option::None)
    }
}

#[cfg(test)]
mod tests {
    #[test]
    fn it_works() {
        assert_eq!(2 + 2, 4);
    }
}

telemetry Cargo.toml

[package]
name = "derive_telemetry"
version = "0.0.0"
edition = "2018"
autotests = false
publish = false

[lib]
proc-macro = true

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
proc-macro2 = ">= 1.0.29"
syn = ">= 1.0.76"
quote = ">= 1.0.9"
crossbeam = "*"
serde = { version = "*", features=["derive"]}
bincode = "*"

Once again, apologies for the lengthy post. I hope this is clear and I believe this is everything to reproduce what I have and unfortunately I believe it is a minimum working example otherwise I would take more out for ease of reading/answering

Macros can see only the exact tokens passed to them (for derive macros this is only struct definition and the attributes which come after them). You probably want to implement Telemetry for AnotherCustomStruct (via the same derive, I think) and then call this implementation inside the implementation for TestStruct - that's what every serialization trait would do.

1 Like

Thanks for replying! The issue I believe I run into is the macro implements the telemetry function automatically so that the user doesn't need to know anything about Telemetry other than derive it on the struct they want to use it. My next question would be how to know if I can parse the token in the case of a "primitive" type or if the token is a nested struct and I need to call telemetry on it. How would I handle that inside the handle_named_fields?

The usual way is not to guess, but to use the trait implementation for all fields. For primitive types, it'll be provided by the same crate which provides the trait itself; for structs, it'll be derived and forwarded, directly or indirectly, to the former.

1 Like

So are you saying to call telemetry on every member of a struct. Which in the case of a u64 or Vec would work as it does and in the case of a struct, the Telemetry trait is derived on it and the telemetry function is called? If this is what you are saying, my question would be how do I tell if a token is a primitive type and I can parse it as a named_field vs if the token is a struct and I need to call it's derived telemetry fn? Am I fundamentally missing something?

For u64 or Vec, you'll be also calling the implementation of Telemetry - the only difference is that this implementation is not derived, but provided explicitly.

1 Like

I thought I had a solution. I just need to recall parse_named_fields anytime a field is a struct itself and the behavior will be exactly what I need. Is this what you meant? The only problem is I can't see how to tell if a Field is a struct or not. Maybe I'm not grasping what you have in mind, could you make a small example. I'm not asking you to write the code for me, but something a little more pseudocode or code that shows intent but doesn't necessarily compile.

I guess the question is what needs to change in the above code to get from the current behavior to the desired behavior?

You can't. However, that's not a problem, because you don't need to. You just emit recursive calls through the Telemetry trait, using nothing but the field name. Type inference will figure out the rest after macro expansion. The literal code that your derive macro should emit looks like the impl below (assuming the following declarations):

trait Telemetry {
    fn do_it(&self);
}

struct Foo {
    bar: u64, // a primitive
    qux: MyQux, // not a primitive – doesn't matter either way
}

impl Telemetry for Foo {
    fn do_it(&self) {
        Telemetry::do_it(&self.bar); // calls <u64 as Telemetry>::do_it()
        Telemetry::do_it(&self.qux); // calls <MyQux as Telemetry>::do_it()
    }
}
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.