Macro_rules repetition of alternatives

I have a macro which collects a sequence of items that come in two varieties:

  1. individual item: description + value
  2. a group of items: heading + repeated item (desription + value)

which can be used like this

stuff!(
   entry!("this" 42)
   entry!("that" 66)
   entry!("bunch of"
            ("bananas" 33)
            ("flowers" 44)
            ("grapes"  55)
   )
)

The key parts of the implementations of stuff! and entry! are


macro_rules! stuff {
    ($($entries:expr)*) => { ... };
}

macro_rules! entry {
                        ($description:literal $value:literal)    => { ... };
    ($heading:literal $(($description:literal $value:literal))*) => { ... };
}

I would like to remove the need to use the inner macro entry!, so that stuff! could be used like

stuff!(
   ("this" 42)
   ("that" 66)
   ("bunch of"
      ("bananas" 33)
      ("flowers" 44)
      ("grapes"  55)
   )
)

or ideally even

stuff!(
   "this" 42
   "that" 66
   ("bunch of"
      "bananas" 33
      "flowers" 44
      "grapes"  55
   )
)

though I wouldn't mind some terminator token like a comma or semicolon after each value.

Is it possible to implement something like this with macro_rules?

How you parse this depends on what you intend to do with the input. If you want to execute some code independently for each pair, you can use something like this:

macro_rules! stuff {
    (@parse $ctx:tt, ) => {};
    
    (@parse [$($ctx:tt)*], $key:literal ( $($inner:tt)* ) $($tail:tt)*) => {
        println!("{:?} {:?}: (", &[$($ctx)*] as &[&str], $key);
        stuff!(@parse [$($ctx)* $key,], $($inner)*);
        println!("),");
        stuff!(@parse [$($ctx)*], $($tail)*);
    };
    
    (@parse $ctx:tt, $key:literal $value:literal $($tail:tt)*) => {
        println!("{:?} {:?}: {:?},", &$ctx as &[&str], $key, $value);
        stuff!(@parse $ctx, $($tail)*);
    };
    
    ( $($toks:tt)* ) => {
        stuff!(@parse [], $($toks)*);
    };
}

fn main() {
    stuff!(
        "this" 42
        "that" 66
        "bunch of" (
            "bananas" 33
            "flowers" 44
            "grapes"  55
        )
    );
}

/*
== Output ==

[] "this": 42,
[] "that": 66,
[] "bunch of": (
["bunch of"] "bananas": 33,
["bunch of"] "flowers": 44,
["bunch of"] "grapes": 55,
),
*/

You can't parse the whole construct with a single rule, because macro_rules! can't handle variance within a match pattern. That leaves parsing one line at a time. Note that this will eventually hit the recursion limit for any sufficiently large input, but maybe that's not a concern.

If you need to know more about the overall structure of the input, you'll probably need to rewrite this into an accumulator macro. But I can't meaningfully speculate on what that'd look like without knowing what you're trying to do.

3 Likes

It gets stuffed into a vector roughly like this

vec![
    item("this", 42),
    item("that", 66),
    group("bunch", vec![
      item("bananas", 33),
      item("flowers", 44),
      item("grapes" , 55),
    ]),
]

where item and group simply construct variants of an enum

enum Entry {
    Single(Item),
    Group { key: K, items: Vec<Item> },
}

(Elsewhere, I also want to apply a similar pattern to much more complex data, but that can wait until I've got the hang of it in the simpler context.)

I'm unlikely ever to have much more than a dozen items per invocation, so I guess that will be ok.

(In the second, more complex case, it will run into the hundreds.)

So, first of all, here is the more general version of what you said above:

#[derive(Debug)]
enum Entry<K, V> {
    Single { key: K, value: V },
    Group { key: K, items: Vec<Entry<K, V>> },
}

macro_rules! stuff {
    (@parse [ $($inner:tt)* ], ) => {
        vec![$($inner)*]
    };
    
    (@parse [ $($acc:tt)* ], $key:literal ( $($inner:tt)* ) $($tail:tt)*) => {
        stuff!(
            @parse
            [
                $($acc)*
                Entry::Group {
                    key: $key,
                    items: stuff!(@parse [], $($inner)*),
                },
            ],
            $($tail)*
        )
    };
    
    (@parse [ $($acc:tt)* ], $key:literal $value:literal $($tail:tt)*) => {
        stuff!(
            @parse
            [
                $($acc)*
                Entry::Single {
                    key: $key,
                    value: $value,
                },
            ],
            $($tail)*
        )
    };
    
    ( $($toks:tt)* ) => {
        stuff!(@parse [], $($toks)*)
    };
}

fn main() {
    let data = stuff!(
        "this" 42
        "that" 66
        "bunch of" (
            "bananas" 33
            "flowers" 44
            "grapes"  55
        )
    );
    println!("{data:#?}");
}

/*
== Output ==

[
    Single {
        key: "this",
        value: 42,
    },
    Single {
        key: "that",
        value: 66,
    },
    Group {
        key: "bunch of",
        items: [
            Single {
                key: "bananas",
                value: 33,
            },
            Single {
                key: "flowers",
                value: 44,
            },
            Single {
                key: "grapes",
                value: 55,
            },
        ],
    },
]
*/

Unlike the previous one, this works by accumulating the contents of each vec![...] call in the $acc group before finally outputting it when it's run out of input. You have to do this because you can't return loose tokens that you then stuff back into the vec! macro after the fact. Macros have to result in complete syntax constructs, so we have to buffer output until we have a complete syntax construct.

Still, it's fundamentally the same basic principle: match the very next part of the input, handle that in isolation, then recurse on the remainder.

Now, I was ~85% of the way through this when I realised the enum you described can't have nested groups. That actually requires a slightly different construction for the macro. To whit:

#[derive(Debug)]
struct Item<K, V> {
    key: K,
    value: V,
}

#[derive(Debug)]
enum Entry<K, V> {
    Single(Item<K, V>),
    Group { key: K, items: Vec<Item<K, V>> },
}

macro_rules! stuff {
    (@parse [ $($inner:tt)* ], ) => {
        vec![$($inner)*]
    };
    
    (@parse [ $($acc:tt)* ], $key:literal ( $($inner:tt)* ) $($tail:tt)*) => {
        stuff!(
            @parse
            [
                $($acc)*
                Entry::Group {
                    key: $key,
                    items: stuff!(@inner [], $($inner)*),
                },
            ],
            $($tail)*
        )
    };
    
    (@parse [ $($acc:tt)* ], $key:literal $value:literal $($tail:tt)*) => {
        stuff!(
            @parse
            [
                $($acc)*
                Entry::Single(Item {
                    key: $key,
                    value: $value,
                }),
            ],
            $($tail)*
        )
    };
    
    (@inner [ $($inner:tt)* ], ) => {
        vec![$($inner)*]
    };
    
    (@inner [ $($acc:tt)* ], $key:literal $value:literal $($tail:tt)* ) => {
        stuff!(
            @inner
            [
                $($acc)*
                Item {
                    key: $key,
                    value: $value,
                },
            ],
            $($tail)*
        )
    };
    
    ( $($toks:tt)* ) => {
        stuff!(@parse [], $($toks)*)
    };
}

The difference here is that there are two distinct "levels" of parsing: outer and inner. This is to account for the different construction syntax for the individual items. There are ways of abstracting that, but it's probably not worth it.

That should be fine. That said, if you use a lot of these, you might want to investigate writing a procedural macro instead. A procedural macro takes longer to build the first time, but will run faster on subsequent rebuilds. Given enough invocations, it'll eventually beat out macro_rules!.

2 Likes

This was in the back of my mind. I find the need to write them in a separate crate to be a huge activation energy barrier, but I suspect that it would be worthwhile here. After all, you get to manipulate your data in Rust, rather than something closer to lambda calculus where you get bogged down in the Turing tarpit pretty soon. And if you do get it to work, all these accumulators will be quadratic.

Time to bite the bullet and try a procedural macro ... where my experience is limited to a couple of toy examples many moons ago.

Would you have any advice pertinent to my specific case, or should it all be plain sailing, once I get familiar with the general approach to writing them?

I recall something about there being two versions of the tooling (proc_macro vs proc_macro2 and maybe syn vs something else) and it not being clear to me how compatible or interoperable they were, and which ones to use.

Many thanks for your complete code samples, BTW: much appreciated. Even if I end up not taking that route, they're very instructive.

Last time I wrote one, it went like this:

  • proc-macro is the interface provided by the compiler, exclusively for use with procedural macros.
  • proc-macro2 is a combination reimplementation of and wrapper around proc-macro. It can be used in procedural macros, but also used in other contexts like unit tests, build scripts, etc.
  • syn is a crate that takes the flat tokens proc-macro/proc-macro2 gives you, and parses them into an actual syntax tree, in order to make it easier to understand and manipulate them. It can also turn these constructs back into tokens that proc-macro will understand.
  • quote is a crate that defines a quote! macro that's like format!, but for macro output.

The entry point of your macro will use proc-macro types. You then convert that into proc-macro2 types, then feed those to syn for parsing. Finally, the simplest way to construct output is to use quote! like you would format!. You'll want to check the docs for syn for some basic examples. I think there's more complete documentation for this kicking around somewhere.

Writing a proc macro for this shouldn't be terribly difficult. Parsing should be fairly straightforward, as should constructing the output. You might want to write both versions, even if it's just for practice.

1 Like

Definitely!

Unfortunately, my Copious Spare Time might make that difficult in the short term. :frowning:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.