Macros "or" repetition

If I want to write a macro that matches X or Y, it's simple:

macro_rules! x_or_y {
    (X) => { println!("X"); };
    (Y) => { println!("Y"); };
}

But what if I want to match a sequence of X-s and Y-s?

I can fall into a recursive macro:

macro_rules! xs_or_ys {
    (X, $(other:tt)*) => {
        println!("X");
        xs_or_ys!($(other)*);
    };
    (Y, $(other:tt)*) => {
        println!("Y");
        xs_or_ys!($(other)*);
    };
    () => {};
}

But it complicates even this extra simple macro. And in reality I have much more complex macro, that without recursion is relatively simple, but with it... not to mention.

Is there a something like "or repetition" in Rust? Something like:

macro_rules! xs_or_ys {
    ($($(X <or> Y))*) => {
        $(
            println!($(X <or> Y));
        )*
    };
}

How does it complicate the macro? I don't see it in your example. Could you give an example closer to your actual code?

OK. I'm pasting how it seems now and how it'll seems like (not really, I don't want to write that. I'll paste a similar macro):
Currently:

macro_rules! matches_ch {
    ($expression:expr, $($(Some($pattern:pat) <or> $i:ident))|+ $(,)?) => {
        matches_ch!($expression, $($(Some($pattern) <or> $i))|+ if true)
    };

    ($expression:expr, $($(Some($pattern:pat) <or> $i:ident))|+ if $guard:expr $(,)?) => {
        match_ch!(($expression) {
            $(
                $(Some($pattern) <or> $i) if $guard => true,
            )+
            _ => false,
        })
    };
}

Will become roughly (hopefully not):

macro_rules! match_ch {
    (($e:expr) $body:tt) => {
        match_ch!(@internal ($e) {} {} $body)
    };

    // Stop condition
    (@internal ($e:expr) $has_char:tt $none:tt {}) => {
        if let Some(ch) = $e {
            match ch.get() $has_char
        } else $none
    };

    (@internal ($e:expr) { $($has_char:tt)* } $none:tt {
        Some($($pattern:pat)|+) $(if $guard:expr)? => $result:expr,
        $($not_processed:tt)*
    }) => {
        match_ch!(@internal ($e) {
            $($has_char)*
            $($pattern)|+ $(if $guard)? => $result,
        } $none { $($not_processed)* })
    };

    (@internal ($e:expr) $has_char:tt {} {
        None => $result:expr,
        $($not_processed:tt)*
    }) => {
        match_ch!(@internal ($e) $has_char { $result } { $($not_processed)* })
    };

    // Bound name pattern when already met `None`
    (@internal ($e:expr) { $($has_char:tt)* } { $($none:tt)+ } {
        $name:ident => $result:expr,
    }) => {
        match_ch!(@internal ($e) {
            $($has_char)*
            ch => {
                let $name = Some(unsafe { NonZeroU8::new_unchecked(ch) });
                $result
            },
        } { $($none)+ } {})
    };
    // Bound name pattern when not met `None`
    (@internal ($e:expr) $has_char:tt {} {
        $name:ident => $result:expr,
    }) => {
        match_ch!(@internal ($e) $has_char {
            let $name: Option<NonZeroU8> = None; // Explicit type may be required
            $result
        } {
            $name => $result,
        })
    };

    (@internal ($e:expr) $has_char:tt $none:tt {
        _ => $result:expr,
    }) => {
        match_ch!(@internal ($e) $has_char $none {
            _ch => $result, // Delegate to bound name
        })
    };
}

And I omitted the ~140 lines comment above the macro...

You can write a "meta-macro" helper to derive a macro that munches sequences out of a set of rules for the singular case.

Showcase

macro_enhance! { // <- meta-macro helper
    #[macro_enhance::with_separator( , )]
    #[macro_enhance::join_with {
        (
            $( // repetition over all the expansions
                $($expansion:tt)* // a single expansion
            )*
        ) => (
            $(
                $($expansion)* ;
            )*
        );
    }]
    // single expansions rules
    macro_rules! x_or_y {
        (
            X
        ) => (
            println!("X")
        );
        
        (
            Y
        ) => (
            println!("Y")
        );
    }
}

fn main ()
{
    x_or_y!(X, Y, X, X, Y);
}

which expands to:

fn main ()
{
    println!("X");
    println!("Y");
    println!("X");
    println!("X");
    println!("Y");
}
  • Playground (feel free to click on "expand macros" on the right)


Regarding your matches_ch! macro, I think there is a clear case of XY here: you should explain what you are trying to accomplish, and really think if using such a complex macro is worth the very little gain it yields.

That is, from looking at your macro, if I've guessed it right (I may very well have not!), you want to be able to support the following:

match_ch!( (NonZeroU8::new(x) {
    Some(42) => { ... },
    Some(x) if x % 2 == 0 => { ... },
    // etc.
    SPECIAL_END_BRANCHES_HERE
})

where your SPECIAL_END_BRANCHES_HERE are expected to be:

 $( None => { ... }, )?
    some_var_name => { ... }, // or _ => ...

And depending on whether the literal None branch was provided or not, some_var_name would bind to an Option<NonZeroU8> equal to Some(x) if None was provided, or None if not.


In case I've guessed it right

I personally find the syntax confusing, especially given that:

  • the compiler is already smart enough to know that after it has checked for a None branch, it can skip null-checking on the following Some(some_var_name) branch;

  • If you want to have None be bound to some_var_name, you can use the some_var_name @ None pattern.

So, instead of:

match_ch!( (NonZeroU8::new(x)) {
    Some(42) => { ... },
    Some(x) if x % 2 == 0 => { ... },
    None => { ... },
    foo => { ... },
})

you could write:

match NonZeroU8::new(x) {
    Some(42) => { ... },
    Some(x) if x % 2 == 0 => { ... },
    None => { ... },
    foo @ Some(_) => { ... },
    /* or better:
    Some(foo) => { ... }, // */
}

and instead of:

match_ch!( (NonZeroU8::new(x)) {
    Some(42) => { ... },
    Some(x) => { ... },
    foo => { ... },
})

you could write:

match NonZeroU8::new(x) {
    Some(42) => { ... },
    Some(x) => { ... },
    foo @ None => { ... },
}
4 Likes

First, hooray for guessing it right!

Second, it's not complicated - it reflects the syntax of match expression almost exactly (with few exceptions).

I couldn't just use a match statement because to get a u8 out of NonZeroU8 you call its get() method, and without that you can do almost nothing.

Background: I'm writing a lexer, which operates on ASCII (so u8 and not char), and its functions return None to indicate EOF. So after getting the basics working, I optimize them to use NonZeroU8 and occupy one byte instead of two (yes, I know that NULL can appear in a file, I don't care. I use NonZeroU8::new_unchecked() so NULL becomes EOF which is what I want). Now, almost every piece of the lexer matches this Option<NonZeroU8>. With u8 I get is easy: just match. But as I said, NonZeroU8 is very bad for pattern matching. For example, to identify a newline with u8 I does:

match consume!(lexer) {
    Some(b'\n') => make_token!(TokenKind::NewLine),
    // ...
}

With NonZeroU8 this becomes so ugly:

match consume!(lexer) {
    Some(ch) if ch.get() == b'\n' => make_token!(TokenKind::NewLine),
    // ...
}

And it's even worse with or patterns and range patterns, both are common in my code. Consider skipping whitespaces (ugh!):

match consume!(lexer) {
    Some(ch) if ch.get() == b'\r' || ch.get() == b'\t' || ch.get() == b' ' => {},
    // Versus:
    Some(b'\r' | b'\t' | b' ') => {},
    // ...
}

Or reading a number:

match consume!(lexer) {
    Some(ch) if b'0' <= ch.get() && ch.get() <= b'9' => read_number(lexer),
    // Or better, but still ugly:
    Some(ch) if (b'0'..=b'9').contains(&ch.get()) => read_number(lexer),
    // Versus:
    Some(b'0'..=b'9') => read_number(lexer),
    // ...
}

With hundreds of such places, I decided instead to write this macro that emulates match statement on NonZeroU8.

Now I want to do the same for matches!(), less common but still. And it is so sad to not re-use the existing match_ch!() macro for the new matches_ch!().

If you want a deep dive into the macro and answers to your questions (in case I haven't answered them already), I'm here. This is not the first version, the first didn't work :smile:

My bad, I completely forgot about that :sweat_smile:

Many thanks for your meta-macro! (How much I like macros... Meta of meta...). I will use it. Thanks!

1 Like

@Yandros I see you use tt to match $. Can't Rust macros match only $ (some kind of escaping?)

If there are things after a "raw" $, then one of the macro_rules! may parse it as the beginning of:

  • a metavariable, if what follows is an identifier,

  • a repetition, if what follows is an opening group ({, (, [).

In those positions, it is then necessary to capture it with a :tt, and then add an optional post-processing step afterwards (e.g., assert_is_dollar!($captured_token) with macro_rules! assert_is_dollar {($) => ()}).

In my case, I also had an extra reason to capture a dollar as a token: it helps remove repetition ambiguities when dealing with this kind of higher order macros.

That is,

macro_rules! mk_ignore {() => (
    macro_rules ignore {(
        $($tt:tt)*
    ) => (
        /* Nothing */
    )}
}
mk_ignore!();

fails since the mk_ignore! invocation encounters a $( ... and so tries to expand to a repetition connected with some input, which there are none.

But the folllowing works:


macro_rules! mk_ignore {($dol:tt) => (
    macro_rules ignore {(
        $dol($tt:tt)*
    ) => (
        /* Nothing */
    )}
}
mk_ignore!($);

and I have taken the personal habit of calling it $__ instead of $dol; it looks more like an "escaped" dollar :smile:

2 Likes

Thanks. Just note that your macro is not a completely compatible for macro_rules!: you can only use parentheses and not other token tree delimiters inside the nested macro. That is, () => () is allowed, but {} => {} is not. This doesn't matter to me, though (although I did change the body to match {} instead of () because I like the latter).

Yeah, supporting the three group delimiters is cumbersome, and not worth for a demo. I personally prefer (), since it is less error-prone in that it does not give the false impression of the emitted code living in a block (({ ... }) being then the way to achieve the latter) :wink:

Nice. But I prefer {} because it looks like a block of code ({ println!("a"); println!("b"); }) is valid Rust but (println!("a"); println!("b");) ain't). Of course, if I need to declare variable I use {{ ... }}.