Macro parse enum with wrapped type

Hello,

Main question

I want to wrap a type of enum if I detect a certain attribute.
For example,

second_example!{
    pub enum SecondTest {
        A,
        B,
        #[unsafe]
        C(i32, String),
        D {
            a: isize,
            b: String,
        }
    }
}
// expect to turn this into below
pub enum SecondTest {
        A,
        B,
        C Wrap<(C, i32, String), _>,
        D {
            a: isize,
            b: String,
        }
    }

I have tried several approaches:

  1. direct matching
    This isn’t working because I realised that the repetition depth of attrs differs from the rest. Additionally, if I place $variant_ty:ty directly after $variant, the parser becomes confused and cannot determine which match to apply.
  2. TT Muncher
    The solution is inspired from macro_book and this_post. However, it doesn’t work after adding attributes.

Side questions

Besides these, there are some concepts I’d like to confirm with the community to ensure I understand them correctly:

  1. In the TT Muncher case, I believe I failed because the attributes repeat themselves if I place them at the beginning. As a result, the parser cannot determine when to stop, causing the remainder of the matching to fail. Is this understanding correct?
  2. In the macro_book TT Muncher example, I thought I could define the type while parsing, but it consistently failed. Does this mean it’s impossible to output and parse in the same block? In the second example with an additional push-down, it seems achievable by placing idents into a separate @ block for collection. Is this the correct approach?

Many thanks.

it is possible with tt muncher, but trust me, it's really messy and ugly, you don't want to do this.

the trick is, delay the parsing of the attributes into the meta token type, use repetition of tt instead. once the token is parsed as meta, you cannot match it against a specific attribute like #[unsafe].

also, to tt munch against nested token tree, I like to use parenthesis to group tokens of different levels, making the code VERY hard to read.

a very messy proof of concept

1 Like

Thank you so much :face_holding_back_tears:
I am still trying to figure out your delicate design :thinking:
If I solve my problem, I will share the result here for future reference.
Appreciate your help.

yeah, macros are fun and scary at the same time. but I think many of the complexity come from the syntax itself.

I'll try to use some examples to show how the tokens are transformed, hopefully this will help you come up your own solution.

note: the example is not valid rust enum syntax, but some simplified "pseudo" syntax with only the variant definitions:

the outer macro simply turn the rust syntax into structured groups of tokens
to feed the inner recursive macro (attributes are captured as repetition of tts):

enum Name {
    #[attr_a] VariantA,
    #[attr_b1="xyz"] #[special_attr] #[attr_b2]
    VariantB (i32, u32),
    VariantC,
}
==>
 ( #[attr_a] VariantA )
 ( #[attr_b1="xyz"] #[special_attr] #[attr_b2] VariantB (i32, u32) )
 ( VariantC )

the inner macro has two (remember, this is simplified) group of tokens, one for the "output", one for the "input", and incrementally parse the "input" group and move the (potentially transformed) tokens into the "output" group, until all the "input" is consumed. the structure of the "input" is generated by the outer macro.

at a high level:

@output (
)
@input (
    (VariantA)
    (VariantB)
    (VariantC)
)
==>
@output (
    (transformed VariantA)
)
@input (
    (VariantB)
    (VariantC)
)
==>
@output (
    (transformed VariantA)
    (transformed VariantB)
)
@input (
    (VariantC)
)
==>
@output (
    (transformed VariantA)
    (transformed VariantB)
    (transformed VariantC)
)
@input (
)
==>
// final output
pub enum Name {
    someVariantA,
    someVariantB,
    someVariantC,
}

and it parses the "input" tokens in different phases:

the first phase consume one of the attribute at a time.

in order to support the "special" attribute in arbitrary order and position, I save the parsed attribute to a @tmp token group, until either the "special attribute" is found, or all the attributes of the next variant are consumed, at which point it move the whole variant from the "input" group to the "output" group (including attributes in @tmp), and recurse.

// only shows the input group
(
  (@tmp )
  ( #[attr_a] VariantA )
  ( #[attr_b1="xyz"] #[special_attr] #[attr_b2] VariantB (i32, u32) )
  ( VariantC )
)
==> `attr_a` doesn't match `special_attr`, save it to `@tmp`:
(
  (@tmp #[attr_a])
  ( VariantA )
  ( #[attr_b1="xyz"] #[special_attr] #[attr_b2] VariantB (i32, u32) )
  ( VariantC )
)
==> no more attributes, move whole `VariantA` to the output (not shown here):
(
  (@tmp)
  ( #[attr_b1] #[special_attr] #[attr_b2] VariantB (i32, u32) )
  ( VariantC )
)
==> `attr_b1` doesn't match `special_attr`, save it to `@tmp`:
(
  (@tmp #[attr_b1="xyz"] )
  (#[special_attr] #[attr_b2] VariantB (i32, u32) )
  ( VariantC )
)
==> `special_attr` matches!!!
==> move the whole `VariantB` to the output,
==> but with the fields wrapped in `Wrap<>`
==> don't forget the attributes in `@tmp` too
@output (
  //...
  ( #[attr_b1="xyz"] #[attr_b2] VariantB Wrap<(i32, u32)> )
)
@input (
  (@tmp )
  ( VariantC )
)
==> repeat for the rest of the variants
//...
1 Like

Thank you for the detailed explanation.
I MUST say you are too nice to be true :face_holding_back_tears:
After learning this skill, macro gives me a feel like Tower of Hanoi
Or some traditional filtering method with many buckets/zones to clean something step by step.

Hi @nerditation,

I have a further question to ask about. I do the experiment to see if I can further manipulate the tt with another function like below.

{
    $(#[$outer_attr:meta])*
    $vis:vis
    $E:ident
    (
        $(($(#[$attr:meta])* $variant:ident $($fields:tt)*))*
    )
    (
        (@tmp $(#[$before:meta])*)
        ($next_variant:ident $($next_fields:tt)*)
        $($rest:tt)*
    )
} => {
    zz! {
        $(#[$outer_attr])*
        $vis
        $E
        (
            $(($(#[$attr])* $variant $($fields)*))*
            ($(#[$before])* $next_variant zz!(@sol $($next_fields)*))
        )
        (
            (@tmp)
            $($rest)*
        )

    }
};

(@sol $filed_ty:ty) => {field_ty}; 
(@sol) => {};

I think I lack some concepts here. Here is my logic.

  1. If all the attributes are consumed, we will reach this match
  2. After that, $($next_fields:tt)* will be either {empty}, {(...tuple)}
  3. I make the inner function @sol to check or do the further operations
    But I failed to do anything further
    The compiler says below
error: expected one of `(`, `,`, `=`, `{`, or `}`, found `zz`
   --> src/tt_muncher/mod.rs:117:47
    |
31  |                   $(#[$attr])* $variant $($fields)*
    |                                        - help: missing `,`
...
117 |                   ($(#[$before])* $next_variant zz!(@sol $($next_fields)*))
    |                                                 ^^ expected one of `(`, `,`, `=`, `{`, or `}`
...

I have tried but they all failed

(@sol $filed_ty:ty) => {field_ty}; 
(@sol $($tt:tt)*) => {$($tt)*}; 

Many thanks. Honestly, I kept encountering this issue in different circumstances. But, I believe maybe it is just one or two concepts that I misunderstand tt fragment or matching mechanism. Thank you.

[Explanation]
After reading the book, tt(Token Tree) to me is like empty, brackets. Therefore, I want to make it more general to classify between the unit variant, tuple variant, and named(struct) variant. That's why I want to try if, in the last step, I can create the supplement fn to parse the tt after ident, it would be super helpful.
But, after trying this in several ways, I failed to achieve what I wanted to achieve. Therefore, I believe that I misunderstood the mechanism of the macro. or missed some key concepts.

a tt is either a single token (leaf of tree), or a bunch of tokens enclosed by balanced matching parenthesis (or square brackets, curly braces). it CANNOT be empty. to be able to accept empty inputs, you must use explicit repetition operators like asterisk or question mark.

because macros doesn't have an "alternative" or "choice" operator, in order to differentiate between tuple variants and struct variants (and also unit variants, for that matter), you must use separate rules. you can put the separate rules in the outer macro (to normalize rust syntax into "intermediate" form), but then your outer macro needs to be incremental; or you can put the rules in the inner macro (while the outer macro simply emits whatever captures).

example:

outer rule doesn't differentiate the different cases,
it first try to capture an optional (question mark) parenthesis enclosed token tree,
then an optional brace enclosed token tree,
and then re-emit the captured (and slightly transformed) tokens.

caveat: this trick is not robust, e.g. it is not recommended for libraries, because this will accept invalid rust syntax when there'are both tuple and struct fields

macro_rules! x {
    {
        enum $E:ident {
            $(
                $V:ident
               $(( $($tuple_field:ty),* $(,)? ))?
               $({ $($field_name:ident : $field_type: ty ),* $(,)? })?
            ),* $(,)?
        }
    } => {
        xx! {
            $E
            ()
            $(
                (
                    $V
                    $(@tuple $($tuple_field)*)?
                    $(@struct $($field_name $field_type)*)?
                )
            )*
        }
    }
}

the inner macro is incremental and has different rules for different variants:

macro_rules! xx {
    { /*exit case omitted*/ } => {};
    // recursive cases. order may or may not matter, depends your "normalized" syntax
    { // unit variant case
        $E:ident
        ($($output:tt)*)
        $(
            ($V:ident)
            $($rest:tt)*
        )*
    } => {
        // do the transformation on the unit variant `$V`
        // put the result into the output group
        // then recurse on the `$rest`, ...
    };
    { // tuple fields case
        $E:ident
        ($($output:tt)*)
        $(
            ($V:ident @tuple $($field:ty)*)
            $($rest:tt)*
        )*
    } => {
        // tuple variant `$V`, with the tuple `$field`
        // recurse on the `$rest`, ...
    };
    { // named fields case
        $E:ident
        ($($output:tt)*)
        $(
            ($V:ident @struct $($field_name:ident $field_type:ty)*)
            $($rest:tt)*
        )*
    } => {
        // struct variant `$V`, with the `$field_name` of `$field_type`
        // recurse on the `$rest`, ...
    }
}

macros are expanded or substituted, they are not "evaluated" or "invoked". to do "incremental" (and thus recursive) parsing, you must substitute the whole thingy with the recursive "call" (you must code entirely in "continous-passing-style").

I should note, it is OK to substitute some parts with macros (doesn't matter it's the "same" macro, or another helper macro), but they will NOT "recurse" like the outer macro. and if you don't know what you are doing, they might end up in positions where macros are not allowed, this is the reason you get a syntax error, because the inner macros are in enum variant position. read the little book for how macros are parsed by the compiler.

also, I need to explain, why you cannot use $($tt:tt)? in the outer macro to capture the fields of a variant? after all, either the tuple fields (enclosed by parenthesis) or the named fields (enclosed by curly braces) can be parsed as a single token tree?

if you try something like this, you'll find it doesn't work:

{
    enum $E:ident {
        $(
            $V:ident $($fields:tt)?
        ),*
    }
} => {
    //...
}

the reason is, rust macro rules don't support backtrack, and repetition operators are greedy. because tt will literally match anything, repetition of tt can only be used as the last piece (can be inside a subgroup of token tree), otherwise, there will be parsing errors.

for the enum syntax, each variant itself is not enclosed inside a pair of parenthesis, but sperated by commas. if you were to match a unit variant with $($tt:tt)?, it could have captured the comma as the "fields", which is nonsense, and would also fail to match the following variants.

1 Like