Macros-by-example and their limitation to consuming only one token tree

zackw · October 22, 2024, 6:32pm

Splitting a new thread for this, since it isn't anything to do with either the original question or the meta-discussion that the thread turned into.

I was talking about how a macro-by-example's arguments must all come from the single brace, bracket, or parenthesis group that immediately follows. You can't get around that with tt-munching as far as I know.

macro_rules! itself is an example of the sort of thing you can't do with macros by example. I'd really like to be able to write macros that are invoked like

error_impl! MyError {
    Display (self, f) { f.write_str("my thing errored") }
}

instead of

error_impl! {
    MyError:
    Display (self, f) { f.write_str("my thing errored") }
}

which is the least ugly workaround I could think of in the actual code that inspired this example. And I'd also really like to be able to write macros that look syntactically like control structures:

    repeat! {
        // loop body
    } until /* controlling expression */ ;

afetisov · October 22, 2024, 8:40pm

Macros are very deliberately designed to look nothing like language-native syntax, and supporting arbitrary funky syntax is a non-goal.

notriddle · October 22, 2024, 9:10pm

Unfortunately, this works

bitflags! {} use bitflags::bitflags;

This implies that the compiler needs to know, before finishing name resolution, that the use statement is actually an import and not part of the macro invocation.

CAD97 · October 22, 2024, 9:35pm

As a historical note, macro_rules! used to be parsed with the same macro parsing machinery, and it was possible (unstably) to parse (e.g. in cfg-excluded items) the syntax with other bang identifiers. Today, however, macro_rules! is treated as its own syntax kind, and is only a conditionally reserved name; it's allowed to define a macro_rules! macro_rules if you're so inclined.

zackw · October 24, 2024, 4:12pm

I know. I think that was an incorrect design decision, and furthermore, one that has been thoroughly undermined by proc macros, to the point where the only thing we accomplish by continuing to insist on it is to force people to use proc macros when they shouldn't have had to.

I see no good reason why declarative macros 2.0 should not be at least as capable as Scheme's syntax-case. (What we have now is not even as powerful as the more limited syntax-rules despite having very obviously been inspired by syntax-rules.)

Blech. I can think of workarounds, but honestly my recommendation would be to require macro! macros to be declared textually prior to use. It makes them different from other kinds of items, but in a way that's easy to explain and motivate (using exactly this sort of example).

Cerber-Ursi · October 24, 2024, 5:01pm

What about bitflags::bitflags! {} fn foo() {}? Or this:

fn foo() {
    bitflags::bitflags! {} let _ = 42;
}

Both compile right now. Where should we draw the line?

zackw · October 24, 2024, 5:17pm

Assuming use bitflags::bitflags has appeared earlier in the file, the parsing of these examples would be almost entirely up to the definition of bitflags::bitflags, just as it is now for procedural macros.

I would draw only one hard line: a macro invoked inside a scope should not be able to consume the closing delimiter for that scope. So, in your second example, bitflags::bitflags! might eat the let _ = 42 but it would definitely not eat the close brace for the function definition or anything after that.

kpreid · October 24, 2024, 5:35pm

This is not correct. Procedural macros can be invoked in three forms, as per the Rust Reference:

Procedural macros allow creating syntax extensions as execution of a function. Procedural macros come in one of three flavors:

Function-like macros - custom!(...)

Derive macros - #[derive(CustomDerive)]

Attribute macros - #[CustomAttribute]

None of these three forms allow arbitrary multi-token-tree parsing.

Function-like macros receive the contents of one set of brackets — just like macro_rules! macros.
Derive and attribute macros receive one item definition, which must still match the Rust grammar for items.

It would certainly be useful for macro_rules! to be usable in derive and attribute position (see Declarative `macro_rules!` attribute macros by joshtriplett · Pull Request #3697 · rust-lang/rfcs · GitHub, Declarative `macro_rules!` derive macros by joshtriplett · Pull Request #3698 · rust-lang/rfcs · GitHub, macro_rules_attribute) but that’s not the same as parsing arbitrary syntax outside of brackets.

zackw · October 24, 2024, 6:22pm

I think you're only technically correct with respect to derive and attribute macros. For example, I had been under the impression that

repeat! {
    // loop body
} until /* controlling expression */ ;

was currently possible with proc macros, and what you're saying means it's not ... unless there is an enabling attribute macro applied to the fn item. The syntax of items is so general that, in practice, if you have

#[proc_macro]
fn name (...) -> ... { ... }

the syntax glossed over by each of the ... is whatever the proc macro wants it to be.
This is such a weak constraint it might as well not exist.

binarycat · October 24, 2024, 6:41pm

attribute macros can do anything to the parsed item, for example, #[tokio::main] requires an async fn item, but then removes the async qualifier to allow main() to be called by the stdlib startup code.

kpreid · October 24, 2024, 7:27pm

An attribute macro cannot make your example repeat! {} until expr; valid syntax, because while repeat! {} will parse as a block that’s a macro call, until expr is invalid.

For example, if we use the #[cfg] attribute to entirely disable some code, it still won't parse:

#[cfg(any())]
fn foo() {
    repeat! { 1 } until true;
}

error: expected one of `!`, `.`, `::`, `;`, `?`, `{`, `}`, or an operator, found keyword `true`
 --> src/lib.rs:3:25
  |
3 |     repeat! { 1 } until true;
  |                         ^^^^ expected one of 8 possible tokens

Replacing cfg with an attribute macro won't make this code compile, no matter what the attribute macro does. Certainly there are lots of significant rewrites one could do to a function body that would appear to give macros in that body new capabilities, but I don't think “might as well not exist” is a reasonable description.

zackw · October 29, 2024, 3:36pm

Hm, OK, I stand corrected. I don't think that really changes my point, though. Proc macros can take in almost any input syntax, and can emit completely arbitrary code as long as it's well-formed. In my view this means there isn't any good reason to keep macros-by-example as limited as they are, because if people can't do what they want with a MBE and they're determined enough they will just use a proc macro instead. All we are doing by limiting MBEs is making those people do extra work.

(If it were completely up to me, proc macros would be allowed to take input syntax that the main parser doesn't know how to parse, with the only restrictions being that they can't mess with input tokenization or delimiter pairing or consume tokens past the end of the scope where they are invoked. But this is a separate issue from the limitations of MBEs.)

kpreid · October 29, 2024, 4:43pm

The limits of MBEs, compared to proc macros, are:

Cannot be used to define derive macros or attribute macros. This might be added in the future, as I linked in my previous post, and can be worked around for now at the price of some inconvenience in usage.
Not expressive enough to perform complex transformations. This is not a restriction on the input syntax; it cannot be solved by removing a constraint, but only by adding new mechanisms that don’t currently exist.

Kyllingene · October 29, 2024, 5:43pm

I feel like it's worth emphasizing that these restrictions aren't specific to declarative macros. Nor do I believe they're solvable without fundamental, catastrophic changes to rustc.

Very few languages allow extending the syntax in arbitrary ways, and for good reason. Not only is it incredibly difficult to do at all, it also makes parsing with anything but a full compiler nigh impossible (you can kiss your syntax highlighting goodbye). It also leads to obscure, esoteric syntaxes that are difficult for both the compiler and user to understand.

As to macros being "not worth it" without these features: the fact that almost every crate uses or provides some form of macro, speaks volumes as to their amazing utility. I myself have written many declarative macros that were comparable to proc macros in power.

SkiFire13 · October 29, 2024, 6:33pm

There are several problems with this approach. How would it even be defined? How far can the unrecognized syntax happen for it to be given to the proc-macro? How are IDEs supposed to differentiate syntax errors from token supposed to be given to a proc-macro? And most importantly, how would Rust add any new syntax in a backward compatible way without breaking these macros?

MBEs also:

can't create new identifies;
are more restricted in the hygiene they can use.

However note that MBEs also have a superpower that proc-macros don't: $crate.

Tom47 · October 30, 2024, 4:41am

Well.. one possible idea would be to give the macro everything that follows have macro tell what part it actually consumed.

And now that I think of it ... it can be done with declarative macro like this:

macro_rules! long_args {
    (@run {} {$($out:tt)*} END) => ( $($out)* );
    (@run {} {$($out:tt)*} PAR {$($nin:tt)*} {$($nout:tt)*} $($rest:tt)*) => ( long_args!{@run {$($nin)*} {$($nout)* ($($out)*)} $($rest)*} );
    (@run {} {$($out:tt)*} BRA {$($nin:tt)*} {$($nout:tt)*} $($rest:tt)*) => ( long_args!{@run {$($nin)*} {$($nout)* {$($out)*}} $($rest)*} );
    (@run {} {$($out:tt)*} SQR {$($nin:tt)*} {$($nout:tt)*} $($rest:tt)*) => ( long_args!{@run {$($nin)*} {$($nout)* [$($out)*]} $($rest)*} );
    (@run {($($head:tt)*) $($tail:tt)*} {$($out:tt)*} $($rest:tt)*) => (long_args!{@run {$($head)*} {} PAR {$($tail)*} {$($out)*} $($rest)*});
    (@run {{$($head:tt)*} $($tail:tt)*} {$($out:tt)*} $($rest:tt)*) => (long_args!{@run {$($head)*} {} BRA {$($tail)*} {$($out)*} $($rest)*});
    (@run {[$($head:tt)*] $($tail:tt)*} {$($out:tt)*} $($rest:tt)*) => (long_args!{@run {$($head)*} {} SQR {$($tail)*} {$($out)*} $($rest)*});
    (@run {$id:ident ! ! $($args:tt)*} {$($out:tt)*} $($rest:tt)*) => (long_args!{@run {} {$($out)* $id!{$($args)*}} $($rest)*});
    (@run {$head:tt $($tail:tt)*} {$($out:tt)*} $($rest:tt)*) => (long_args!{@run {$($tail)*} {$($out)* $head} $($rest)*});
    ($($in:tt)*) => (long_args!{@run {$($in)*} {} END});
}

Now you can do:

macro_rules! repeat {
    ($stmt:tt until $cond:expr; $($rest:tt)*) => (
        loop {
            $stmt
            if $cond { break; }
        }
        long_args!{$($rest)*}
    );
}

long_args!{
    fn macro_test() {
        let mut i = 0;
        repeat!! {   // Note the double exclamation mark
            println!("iter: {i}");
            i += 1;
        } until i >= 3;
        println!("after: {i}");
    }
}

So if one really wants macros like OP's repeat!, this might be the way.

Topic		Replies	Views
"macros cannot expand to match arms" :-(	20	971	October 22, 2024
Would this syntax be possible with macro_rules? help	16	859	January 12, 2023
New Rustacean e010: Macros rule! announcements	5	2977	January 12, 2023
Macros (and syntax extensions and compiler plugins) - where are we at? announcements	10	7742	January 12, 2023
Limits of Rust macros help	7	1085	January 12, 2023

Macros-by-example and their limitation to consuming only one token tree

Related topics