How to parse enum macro?

This is kinda what prompted me to post View macro output live? but they are separate topics, so figured it's best to post this separately...

I'd like to write something like:

bridge_events![
    InitialLoad,
    ChangeTodo(EntityId, String),
]

which will turn into both of these:

#[cfg_attr(feature = "ts_test", derive(EnumIter, AsRefStr))]
#[derive(FromPrimitive, Copy, Clone, Debug)]
#[repr(u32)]
pub enum BridgeEvent {
    InitialLoad,
    ChangeTodo,
}

pub enum Event {
    InitialLoad,
    ChangeTodo(EntityId, String),
}

So it separates the enum variant name from the params to create the BridgeEvent variants and then reconstructs it (or uses the original) to just write Event variants directly.

I'd imagine the attributes above BridgeEvent don't really matter - but figured I'd include it just in case that changes things...

You "just" have to manually parse what represents the syntax of an enum definition: a comma-separated sequence of variant names, where each may be followed by a parenthesized comma-saparated sequence of types (technically braces with named fields are a possibility too, but handling both within the same macro would require making it much more complex), and both sequences accept trailing commas:

macro_rules! bridge_events {(
    $(
        $VariantName:ident $(
            ( $($T:ty),+ $(,)? )
        )?
    ),+ $(,)?
) => (
    #[cfg_attr(feature = "ts_test",
        derive(EnumIter, AsRefStr),
    )]
    #[derive(FromPrimitive, Copy, Clone, Debug)]
    #[repr(u32)]
    pub
    enum BridgeEvent {
        $(
            $VariantName ,
        )+
    }

    pub
    enum Event {
        $(
            $VariantName $(
                ( $($T ,)+ )
            )? ,
        )+
    }
)}

See The Little Book of Rust macros for a guide regarding them.

5 Likes

Thanks!

Right - it should support braces too... is there not a way to capture "everything after the variant name" to cover both cases (parens and braces)?

Yes, both ( ... ) and { ... } can be captured in the same rule (with the :tt "category"), but , is also a :tt (everything kind of is), meaning that using

    $VariantName:ident $( $assoc:tt )? ,

is ambigous, so we need to manually parse this using different rules and recursion ("recursive muncher" pattern, c.f., the aforementioned Little Book Of Macros):

  • You start by setting up the muncher with this kind of "entry point" rule:

    // == ENTRY POINT ==
    (
        $($input:tt)*
    ) => (bridge_events! {
        // a sequence of brace-enclosed variants
        @variants []
        // remaining tokens to parse
        @parsing
            $($input)*
    });
    
  • Hoping to end with something like:

    // Done parsing, time to generate code:
    (
        @variants [
            $(
                {
                    $VariantName:ident $( $variant_assoc:tt )?
                }  
            )*
        ]
        @parsing
            // Nothing left to parse
    ) => (
        #[cfg_attr(feature = "ts_test",
            derive(EnumIter, AsRefStr),
        )]
        #[derive(FromPrimitive, Copy, Clone, Debug)]
        #[repr(u32)]
        pub
        enum BridgeEvent {
            $(
                $VariantName ,
            )*
        }
    
        pub
        enum Event {
            $(
                $VariantName $( $variant_assoc )? ,
            )*
        }
    );
    
    • (At @variants, it is the use of outer braces that allows to unambiguously parse a sequence of stuff despite there being optional trailing tokens)
  • And now, the recursion / stepping logic:

    // VariantName
    (
        @variants [
            $($variants:tt)*
        ]
        @parsing
            $VariantName:ident
            $(, $($input:tt)*)?
    ) => (bridge_events! {
        @variants [
            $($variants)*
            {
                $VariantName
            }
        ]
        @parsing
            $( $($input)* )?
    });
    
    // VariantName(...)
    (
        @variants [
            $($variants:tt)*
        ]
        @parsing
            $VariantName:ident ( $($tt:tt)* )
            $(, $($input:tt)*)?
    ) => (bridge_events! {
        @variants [
            $($variants)*
            {
                $VariantName ($($tt)*)
            }
        ]
        @parsing
            $( $($input)* )?
    });
    
    // VariantName { ... }
    (
        @variants [
            $($variants:tt)*
        ]
        @parsing
            $VariantName:ident { $($tt:tt)* }
            $(, $($input:tt)*)?
    ) => (bridge_events! {
        @variants [
            $($variants)*
            {
                $VariantName { $($tt)* }
            }
        ]
        @parsing
            $( $($input)* )?
    });
    
  • If the accept-it-all entry-point rule is the first one, the macro will indefinitely recurse within it; that's why it has to be the last rule.

Playground

2 Likes

Also, regarding the features, like accepting attributes such as docstrings on the enum, as well as the ergonomics: "decorating" an enum definition rather that your own syntax of a sequence, you can make your macro be used like this:

bridge_events! {
    /// Some docstring on the `Event` enum
    pub
    enum Event {
        InitialLoad,
        ChangeTodo(EntityId, String),
    }
}

by having the following macro (I have dropped the support for variants with braces for the sake of simplicity):

macro_rules! bridge_events {(
    $( #[$meta:meta] )* // captures attributes and docstring
    $pub:vis // (optional) pub, pub(crate), etc.
    enum $EnumName:ident {
        $(
            $VariantName:ident $(
                ( $($T:ty),+ $(,)? )
            )?
        ),+ $(,)?
    }
) => (
    paste::item! {
        #[cfg_attr(feature = "ts_test",
            derive(EnumIter, AsRefStr),
        )]
        #[derive(FromPrimitive, Copy, Clone, Debug)]
        #[repr(u32)]
        $pub
        enum [< Bridge $EnumName >] {
            $(
                $VariantName ,
            )+
        }
    }

    $(#[$meta])*
    $pub
    enum $EnumName {
        $(
            $VariantName $(
                ( $($T ,)+ )
            )? ,
        )+
    }
)}
  • Where paste::item! { /* item definition here */ } is a macro that allows to use the
    [< Stuff To Concatenate >] syntax inside the item definition to concatenate them. See https://docs.rs/paste

The syntax

bridge_events! {
    /// Some docstring on the `Event` enum
    pub
    enum Event {
        InitialLoad,
        ChangeTodo(EntityId, String),
    }
}

can even be further improved with the macro_rules_attribute! crate, which would let you write:

#[macro_rules_derive(bridge_events!)]
/// Some docstring on the `Event` enum
pub
enum Event {
    InitialLoad,
    ChangeTodo(EntityId, String),
}
1 Like

To merge both the nicer call-site syntax and the support for variants with braces, without making the macro become a mess, you can decide to use a proc_macro_derive macro rather than a macro_rules! macro.

This requires using a whole (helper) crate just for the definition of the #[derive(...)] macro, but lets you write:

#[derive(Bridge)]
/// Some docstring on the `Event` enum
pub
enum Event {
    InitialLoad,
    ChangeTodo(EntityId, String),
}

at the call site.

For that, you can:

  1. create your helper crate (named, for instance, <your_crate>-proc_macro):

    • cargo new --lib --name <your_crate>-proc_macro ./proc_macro/
      
    • Add this to your Cargo.toml

      [dependencies.proc_macro]
      package = "<your_crate>-proc_macro"
      version = "<version of your main crate>"
      path = "./proc_macro/"
      
    • and this to your src/lib.rs:

      #[macro_use] extern crate proc_macro;
      
  2. Add the following to ./proc_macro/Cargo.toml (and set the version to match your main crate's):

    [lib]
    proc-macro = true
    
    [dependencies]
    # proc-macro2 = "1.0.*"  # needed to use TokenStream2 and/or Span 
    quote = "1.0.*"
    syn = { version = "1.0.*", features = [ <required syn features> ] }
    
  3. And then, you can have your #[derive(...)] definition in ./proc_macro/src/lib.rs:

    Click to expand
    extern crate proc_macro;
    
    use ::proc_macro::TokenStream;
    use ::quote::quote;
    use ::syn::{
        Data,
        DeriveInput,
        Error,
        Ident,
        parse_macro_input,
        spanned::Spanned,
    };
    
    #[proc_macro_derive(Bridge)] pub
    fn bridge_events (input: TokenStream) -> TokenStream
    {
        let input = parse_macro_input!(input as DeriveInput);
        let span = input.span();
        let enum_data = if let Data::Enum(it) = input.data { it } else {
            return Error::new(
                span, "Expected an `enum`",
            ).to_compile_error().into();
        };
        let enum_name = input.ident;
        let bridge_enum_name = Ident::new(
            &format!("Bridge{}", enum_name),
            span,
        );
        let variant_names =
            enum_data
                .variants
                .into_iter()
                .map(|variant| variant.ident)
        ;
        TokenStream::from(quote! {
            #[cfg_attr(feature = "ts_test",
                derive(EnumIter, AsRefStr),
            )]
            #[derive(FromPrimitive, Copy, Clone, Debug)]
            #[repr(u32)]
            pub
            enum #bridge_enum_name {
                #( #variant_names , )*
            }
        })
    }
    

    See ::syn's documentation for more info.

1 Like

A lot to chew over... not sure how deep I'm going to dive into properly understanding this at the moment - but will surely reference it (and hopefully revisit each time I come across that comment!). Thanks for the help!

In the meantime... from a very rough glance, it looks like the proc_macro version is both simpler to read (just normal Rust) and has nicer ergonomics for the consumer...

On further thought, I think what I really need is a CLI tool... so there will be one "source of truth" with regular Rust enums, and the tool will then output Typescript and Rust code based on that. (what I need is more customization than what comes out of the box with wasm-bindgen)

Is there a standard way to get from a blob of text (e.g. reading in enums.rs) into the tokens that syn needs to work with? Any other tips for going this route are appreciated too!

Yes, syn can work with ::proc_macro2::TokenStream, which itself implements FromStr, meaning that you can ::std::fs::read_to_string() a file and then .parse::<::proc_macro2::TokenStream>() it.

2 Likes

Is the playground example reading itself and displaying the TokenTree?! That's got to be some award-winning meta/inception post!

2 Likes