Top-down macro parsing, or higher-order macros


#1

Hello. Total Rust newbie here!

I was experimenting with macros, when I stumbled on a deceptively simple problem: how to parse a comma- (or other symbol-) separated list of parts, where each part may have to be parsed with its own set of rules.

I figured the cleanest way would be to have one macro parse the structure and defer each part to a sub-macro. This would allow me to use the most appropriate pattern-matching in the sub-macro, while keeping the outer one as generic as possible. In fact, the outer macro could be higher-order, taking the name of the sub-macro as an argument.

For the sake of argument, let’s assume this template code:

macro_rules! foobar_item {
    (foo) => {
        println!("Foo");
    };
    (bar $x:expr) => {
        println!("Bar {}", $x);
    };
}

macro_rules! foobar {
    // ???
}

fn main() {
    foobar!(foo, bar 2, bar 3);
}

How would you fill in the ???

I thought I’d be able to use $( ),* for example with something generic like $( $($t:tt)+ ),* but it turns out that doesn’t work. If I understand the “local ambiguity” error, the comma could be either a $t:tt or the separator in $( ),* and the macro engine apparently can’t choose between the two.

While experimenting, I also found out that :tt is the only pattern type that can be captured and passed on to a sub-macro for subsequent pattern matching. If I capture something as an :expr or :stmt, for example, I cannot pass it on to a sub-macro and expect to re-match its contents there. (Why? This looks like an abstraction leak to me.)

Anyways, it looks like any top-down macro design—where you match some structure, but defer the details to sub-macros—must use a repeated :tt for its parts, but that devil :imp: cannot be used anywhere except as the last (or only) item in a set of braces () [] {}

Is this correct, by the way?

So I went back to the drawing board and spelled it out in full for the macro engine, as I would do for example in Scheme. This is what I came up with:

macro_rules! foobar {
    ($($tt:tt)*) => {
        parse_comma_list!(foobar_item [$($tt)*]);
    };
}

/// Parse a list of comma-separated items.
///
/// Arguments:
///
/// - The name of a sub-macro to invoke on every item.
/// - The list of tokens to parse, inside brackets.
/// - (used only recursively) The current item being parsed.
///
/// Each item may be an arbitray sequence of tt's.
///
macro_rules! parse_comma_list {
    // we have a comma: emit current item and continue
    // (this pattern must come first)
    (
        $emit:ident
        [, $($rest:tt)*]
        $($item:tt)+
    ) => {
        $emit!($($item)+);
        parse_comma_list!($emit [$($rest)*]);
    };

    // no comma: accumulate item tokens
    (
        $emit:ident
        [$tok:tt $($rest:tt)*]
        $($item:tt)*
    ) => {
        parse_comma_list!($emit [$($rest)*] $($item)* $tok);
    };

    // emit last item
    (
        $emit:ident
        []
        $($item:tt)+
    ) => {
        $emit!($($item)+);
    };

    // empty list, allows for trailing comma
    ($emit:ident []) => {};
}

It works!! :smile:

But I was wondering if you guys have any suggestions or if I’m missing any obvious simpler solution.

Also, I suppose the comma character , cannot be made a parameter of that macro, right?


#2

There are no simple problems in the rust macro system; only crushed souls and broken dreams.

It is a bit funky. This is a good read.

not a snowball’s chance in Ecuador


#3

Your solution looks a lot like a TT muncher (from the same good read).


#4

That book looks awesome. Thanks a lot!

Yes, my solution above is exactly what they call “the most powerful macro parsing technique available” LOL

From what I’ve seen so far, Rust’s macro system looks pretty neat. Hats off to its developers. I don’t think there’s any other this good around, except for Scheme’s syntax-rules, by which it was clearly inspired.