Macros: Avoiding an explosion of patterns for optional parts cleanly

Background: downcast-rs takes a trait and defines methods on it for downcasting.
You don't have to what this crate does. My question is entirely about making the macro DRYer than writing out an exponential explosion of patterns.

Problem: I recently needed to add support for traits with associated types and did a significant refactor to make things DRYer in preparation for it. Now this macro now supports downcasting for various combinations of the existence or non-existence of (1) type parameters, (2) associated types, (3) type constraints (where clauses), and (4) whether the types in the traits are generic or concrete.

Ignoring the concrete type case and the case of a Trait with no parameters or associated types, this macro supports following forms:

impl_downcast!(Trait<T1, T2>); // with type parameters
impl_downcast!(Trait assoc AT1, AT2); // with associated types
impl_downcast!(Trait<T1, T2> assoc AT1, AT2); // with type parameters an associated types

impl_downcast!(Trait<T1, T2> where T1: Copy); // with type constraints
// ... and the other variations with `where` clauses.

These can be summed up via the following macro pattern, using $(...)? to mean optional occurrence:

impl_downcast!(Trait   $( < $($type:ident),* > )?   $( assoc $($atype:ident),* )?   $( where  $($pred:tt)* )?

But there seems to not be an elegant way to capture this optional-ness.

If I had to add yet another orthogonal feature, I'd have to double the number of patterns -- one for those with the feature, and one for those without rather than simply indicating it as an optional part of the pattern.

Is there some clean, elegant, and readable way to do a functional if within a macro to avoid such explosions?

2 Likes

Hm, so you're already aware of TT munching (and now I'm aware that there's a name for it!). I eventually converged upon such a technique as well when faced with the problem of explosive complexity, and I dare confess it is not extremely readable. (nor does a 30 layer macro call stack make for a good error message). But as it stands, macro_rules! is a Turing tarpit. What can you do?

One tip I've gathered from looking at other people's macros is that oftentimes $(xyz)* can be taken as a perfectly reasonable approximation of $(xyz)?, especially in cases where when the generated code for 2+ matches would be syntactically invalid anyways. If the most you gain out of simulating $(...)? is a slightly better error message, it might simply not be worth it!

1 Like

I've wondered recently why, if there's a $( .. )+ and an $( .. )*, there isn't a $( .. )? already. Does anyone know?

6 Likes

Woof! That's an impressive macro, definitely deserving of that rest area.

I was kinda hoping we could have something more declarative and template-like than a recursive parser. Part of solving this problem is not simply having optional patterns, but also being able to output optionally. The where clause patterns I alluded to above could be collapsed if I could conditionally insert, e.g., commas between two parts (only if both parts are non-empty) and conditionally insert where in the output. It would be ideal if rust macros were closer to any of the various well-structured HTML template libraries used for web development. I think there's been an emphasis on well-structured-ness, but not as much on the ergonomics of the templating, unless I'm missing some hidden features.

The where clause1 patterns I alluded to above could be collapsed if I could conditionally insert, e.g., commas between two parts (only if both parts are non-empty) and conditionally insert where in the output.

I still wonder whether it is possible to implement such functionality for ourselves, like

Invocation                         Evaluates to:
-----------                        -------------
join!([,] [] [])                   (nothing)
join!([,] [a,b] [])                a,b
join!([,] [] [c,d])                c,d
join!([,] [a,b] [c,d])             a,b,c,d

prefix_nonempty!([where] [])       (nothing)
prefix_nonempty!([where] [abc])    where abc

Well, okay, obviously the above macros can be easily defined, and certainly I did attempt to do so a while back--but I recall having issues trying to use them. The macro parser is awfully restrictive with regards to where macro invocations can appear and the sort of productions they can produce.

If someone has a successful example of how to successfully implement such utility functions in a manner that they are reusable, I would love to see it.

If someone has a successful example of how to successfully implement such utility functions in a manner that they are reusable, I would love to see it.

About the only way is to use callbacks. macro-attr uses them to have a user-defined macro "expand" to arbitrary token trees.

Really, it's just pushdown with a level of indirection, with all the same problems (because you keep recursing, it's murder on the recursion limit).

1 Like

The main reason this doesn't as we'd like it is that nested macros are late-expanded so that one can't rely on AST coercion to place it anywhere.

However, one way to make your suggestion work and to make macros WAY more ergonomic would be to add syntax for the greedy application of a macro within a macro definition, like say, $!( ...) as follows.

macro_rules! foo {
   ( ($types:ident),* some_syntax ($atypes:ident),* ) => {
        foo! { @helper $!(join!( ($types:ident),*, ($atypes:ident),* )) }
   }
   (@helper ($all_types:ident),* ) => {
        // ...
   }
}

Here, the join! macro would be immediately invoked and not leaked into the expansion, and its definition can stay local to the crate. This would be the analog of a local function call.

If the variables captured within $(..) were to expand identically to the other two (i.e. $($var)*), I can't imagine this being difficult on top of the existing framework. Though, I suppose I would probably want it to be more general and support $(where $(atype:ident),*)? and $(atype),* would expand to empty if the whole where clause were missing.

Downcast like this? https://github.com/rust-lang/rust/issues/35943

I've been dealing with a similar problem lately and have settled on a similar approach, but there's still one issue that's bugging me about matching where clauses. By matching them as :tt*, they can contain literally anything. This basically amounts to a sort of "SQL injection" in the where clause that let's the user hijack the struct body and replace it with anything. This is particularly troublesome with macros that have unsafe implementations. One can of course just say "don't do that" in the docs, but it's still somewhat unsettling.

Except that $($blah)?* is already valid: it's a ?-separated sequence.

If you care about accuracy, you could use parse-generics-shim to (sort-of) parse the where clause... although then you have the opposite problem of it not accepting all valid clauses. You'll have to pick your poison.

A cheap mitigation I've come up with for now is just to have an empty dummy impl block with the where clause attached, so that it appears both there and in the struct expansion. This means any items hidden in it will be expanded twice and cause a name collision error. I'm not sure if it's foolproof, but it's a decent compromise I think.

Well, I mean $($blah)? without the *. IOW, the final character determines the kind of repetition. Currently just having a trailing $(...)? without a * should be failure (haven't tested). Perhaps I'm missing something?

I've thought about this more concretely in terms of the downcast-rs macro. Let's say I have the following two features

  1. Eagerly (as opposed to lazy) macro evaluation via $! postfix (i.e. join$!(...) and inject_where$!(...)).
  2. The optional pattern match $(...)? which expands in one of the following ways. E.g., with the match $( where $($preds:tt)+ )?, one can expand $preds via either
    a. $( where $($preds)* )? (expands with a where only if the original pattern was matched) or
    b. $($preds)* (expands out $preds or empty if the original pattern didn't match).

Then, I would be able to rewrite without the entirety of the original 112-line macro in downcast-rs in the following, much more readable form (IMHO) without any exponential explosion:

macro_rules! join {
    // ...
}

macro_rules! inject_where {
    (types [] where []) => {};

    (types [$($types:ident),*] where [$($preds:tt)+]) => {
        where
            join$!(
                [,],
                [$( $types: ::std::any::Any + 'static, )*],
                [$($preds)*])
    };
}

#[macro_export]
macro_rules! impl_downcast {
    (@impl_full
        $trait_:ident [$($param_types:tt)*]
        for [$($forall_types:ident),*]
        where [$($preds:tt)*]
    ) => {
        impl<$($forall_types),*> $trait_<$($param_types)*>
            inject_where$! { types [$($forall_types),*] where [$($preds)*] }
        {
            /// Returns true if the boxed type is the same as `__T`.
            #[inline]
            pub fn is<__T: $trait_<$($param_types)*>>(&self) -> bool {
                $crate::Downcast::as_any(self).is::<__T>()
            }

            // other fns ...
        }
    };

    // No type parameters.
    ($trait_:ident   ) => { impl_downcast! { @impl_full $trait_ [] for [] where [] } };
    // Type parameters, associated types, and where clauses.
    (
        $trait_:ident $( < $($types:ident),* > )?
        $( assoc $($atypes:ident),* )?
        $( where $($preds:tt)+ )?
    ) => {
        impl_downcast! {
            @impl_full
                $trait_ [ join$!( [,], [$($types),*], [$($atypes = $atypes),*] ) ]
                for [ join$!( [,],  [$($types),*], [$($atypes),*] ) ]
                where [$($preds)*]
        }
    };
    // Concretely-parametrized types with concrete associated types.
    (
        concrete
        $trait_:ident $( < $($types:ident),* > )?
        $( assoc $($atypes:ident = $aty:ty),* )?
    ) => {
        impl_downcast! {
            @impl_full
                $trait_ [ join$!( [,], [$($types),*], [$($atypes = $aty),*] ) ]
                for [] where []
        }
    };
}

A few notes:

  • The definition feels more inlined/declarative with @impl_full showing the complete structure of the macro's output quite clearly.
  • inject_where! is now simpler and also completely local to the module despite being a separate macro.
  • join! is a general-purpose macro that one could also import from another utility crate.
  • Though not applicable for down-casting, adding lifetimes (more generically speaking), would be extremely straightforward.
  • I'm not clear, though, whether this implementation still need AST coercion, given eager sub-macro evaluation. That can easily be added via another general-purpose as_item! macro.

Does this sound reasonable?

Is your point that this causes a parse issue? The syntax would be ? not ?*.

If it does cause an issue, we should keep additional quantifiers like ? in mind when discussing macros 2.0.

Syntax constructs don't exist in a vacuum. The point was that $(...)? when followed by either + or * already means something: the ? is the sequence separator.

Huh. I was hoping something like

    ($($terms:ident)*+) => {}

would mean identifiers separated by *, but it doesn't seem to mean that. So yea, maybe macros 2.0 as suggested by @withoutboats.

Could you be more direct when trying to highlight an issue like this? My understanding is that you're saying there would be a parsing ambiguity in the case of a ? quantifier followed by an unrelated + or * token, but I don't know Rust's grammar intimately enough to be confident that's true.

That is, I think you're saying this would be ambiguous:

macro_rules! foo {
    ($($bar:ident)? + $baz:ident) => { }
}

Assuming it is, I think the solution is for macros 2.0 to either slightly modify the repeating syntax to allow quantifier tokens to be used as separator also somehow or for it to reserve ? as a quantifier. We could also just continue to interpret this the same way, which would create a weird edge case for optional quantifiers but allow them in most cases.

[sheepishly] At the time, I didn't think it would be useful. I think I may have subsequently used * to fake ? more times that I have used +. We really ought to add it.

1 Like

So I've been trying to use helpers with callbacks in a recent project lately, trying to keep in mind the recursion limit.

One thing I ran into which I thought would became a nasty issue is the rule that item-producing macros must be invoked with a semicolon or braces. The thing is; this requirement also infects any helper macro which receives an item-producing callback! Thankfully, I just discovered at the last second before posting this that, apparently, adding a semicolon has no adverse effect for expression-producing macros: (i.e. it does not change the value to ())

macro_rules! some_item { (a) => {pub trait Foo { }}; }
macro_rules! some_expr { (a) => {3}; }

macro_rules! helper_for_items { ($callback:ident!) => { $callback!(a);}; }
macro_rules! helper_for_exprs { ($callback:ident!) => { $callback!(a) }; }

helper_for_items!(some_item!); // OK
// ERROR: macros that expand to items must either be surrounded with braces
// or followed by a semicolon
//helper_for_exprs!(some_item!);

pub fn main() {
    let _: i32 = helper_for_exprs!(some_expr!); // OK
    let _: i32 = helper_for_items!(some_expr!); // ... also OK. Whew.
}

Assuming all of the other things macros can produce are equally unaffected by the semicolon, then helper_for_items suffices as a universal helper for everyone. (my first--and perhaps only, ever--sigh of relief during this project)

I sincerely hope that this is the case, because optional semicolon support ends up being yet another extremely pervasive change:

// A helper which supports optional semicolons.
// Invoked as `ugly_helper!(some_item!;);` or `ugly_helper!(some_expr!)`.
macro_rules! ugly_helper {
    // most information about the callback can be packed up into a single token tree;
    // but the semicolon affects EVERY macro in the chain of calls
    (@foo $cb:ident $($semi:tt)*) => { ugly_helper!(@bar [a $cb] $($semi)*) $($semi)* };
    (@bar $args:tt $($semi:tt)*) => { ugly_helper!(@baz  $args $($semi)*) $($semi)* };
    (@baz $args:tt $($semi:tt)*) => { ugly_helper!(@done $args $($semi)*) $($semi)* };
    (@done [$x:ident $cb:ident] $($semi:tt)*) => { $cb!($x) $($semi)* };
    // starting rule
    ($cb:ident! $($semi:tt)*) => { ugly_helper!(@foo $cb $($semi)*) $($semi)* };
}

// I sincerely hope the above is not necessary and that this is sufficient.
macro_rules! nice_helper {
    (@foo $cb:ident) => { nice_helper!(@bar [a $cb]); };
    (@bar $args:tt) => { nice_helper!(@baz  $args); };
    (@baz $args:tt) => { nice_helper!(@done $args); };
    (@done [$x:ident $cb:ident]) => { $cb!($x); };
    // starting rule
    ($cb:ident!) => { nice_helper!(@foo $cb); };
}

(if nice_helper suffices, then it also implies that the semicolon rule is, you know, kinda entirely arbitrary; but I'll take it!)

I tend to use braces instead, but for the same reason.