It seems to me that macro_rules!(aka macros by example) could've been made to make matching normal rust code easier than it currently is.
Looking at the grammar for enum for example, if you wanted to accept and parse an enum definition passed as the arg(s) to a macro_rules! macro, implementing such a macro(still in progress, will post result here) you'll soon find some obvious, quite show-stopping, limitations:
- Can't express matching "THIS or THAT", without also having the possibility of matching both and none as well. So you can do
$( THIS )? $( THAT )?which matches:THIS,THAT,THISTHAT, and neither of them.
Imagine matching a GenericParam which is:
OuterAttribute* ( LifetimeParam | TypeParam | ConstParam )in EBNF(?) grammar.
So, you can't express that you want to match only 1 of those 3:LifetimeParam,TypeParamorConstParam, therefore you've to resort to wrapping all 3 into$( )?, thus matching 0, 2 or 3 of them at once becomes a possibility(not just mandatorily 1 of either of them as you actually wanted), and thus hope that when transcribed it will compile error(ie. delegate the invalidly matched cases to the compiler to error on), which is an ok-ish workaround let's say. However, this introduces the possibility of hitting the limitation in2.below, which prevents even this workaround in matcher from being used. - Follow-set Ambiguity Restrictions make it harder to match normal rust code given the limitation in
1.because, when you have those 3 blocks of$( )?, then at the end of the 2nd block which is the
TypeParam : IDENTIFIER( `:` TypeParamBounds? )? ( `=` Type )?block, where you match$( = $that_type:ty )?, you'll find that the next block is
ConstParam: `const` IDENTIFIER `:` Type ( `=` Block | IDENTIFIER | `-`?LITERAL )?which means you'll be matching this:
$( const $some_id:ident bla bla )?, but now you get a compile error like
`$that_type:ty` may be followed by `const`, which is not allowed for `ty` fragments, which happens because each of those 3 blocks( LifetimeParam | TypeParam | ConstParam )had to be wrapped in$( )?thus all 3 can appear at the same time(so compiler has to assume this case)! I've such broken example(work in progress) here. - Another less severe issue is that if you want to match
EnumItems : EnumItem ( `,` EnumItem )* `,`?, so you don't want to allow lone comma to be matched, you can't do this
$( EnumItem ),* $(,)?because this matches just a,alone there, but instead you've to do this:
$EnumItem_1of2:ident $( , $EnumItem_2of2:ident )* $(,)?
which means now you've to duplicate theEnumItemmatcher block(which I made justidentfor this example) twice and the metavars must be differently named in each of the twoEnumItemmatcher blocks, here's an example in playground of how that might look even if you want to merge those into one via internal rules after. - line 102 in this requires me to recursively match the very thing I'm inside the matcher matching for: GenericParams (in this case). I'm sure it's possible somehow(I just don't know how atm), but I didn't expect to hit this for some reason.
(there's more limitations, but I don't care to spam about at the moment)
So if either 1. or 2. limitations wouldn't exist, it would be easier to match normal rust code.
Apparently there's some convoluted way to do it, maybe(?) I'm unclear because I find it difficult to reason about and understand it, here.
Now, is the next gen macro thing(aka declarative macros 2.0) able to handle this properly ? I haven't yet gotten to it in my reading.
And why make it so difficult to match rust code? (am I doing it wrong?)
There should be, I say, code in the rustc repo tests that would try to match full rust code for things like a whole enum definition(based on its EBNF grammar as referenced) via macro_rules just to assure everyone that the macro system is proper, especially since it's trying so hard to ensure it's parsing valid rust code already, because why else would limitations in 2. be imposed.