[Workaround] Syntax extensions + unhygienize


#1

Hi,

I’m currently working on a syntax extension to implement trait mocking, though I stumbled over a problem with disabling macro hygienics.

Basically all macro code in Rust is hygienic, i.e., identifiers generated by a macro will not conflict with identifiers of the same name outside the macro (except for items). Usually that is something you want to have … except when you don’t.

macro_rules! lets you interact with outside identifiers by passing it to the macro from the outside context. Reading a bit online about the MTWT algorithm and the libsyntax docs got me so far that for syntax extensions I’d have to change the SyntaxContext of the Idents, e.g., using Ident::unhygienize.

Though doing so is very cumbersome as you have to walk the generated AST and replace the identifiers (Visitor is not able to modify the AST) on your own. I need to disable hygienics for for the whole macro, making this approach very heavy weight (AST structure is very complex to traverse and subject to changes in nightly).

Another option I came up with is to parse the whole function inside a macro and handle special syntax there as everything would share the same syntax context, but I think that’s hardly better. Though I’d prefer to do that if no other options come up (syn + synom makes that easier).

Has anyone an idea of some practical way to solve this?


#2

So, I’ve been able to figure out a … let’s call it a workaround … which I can live with at the moment. Maybe it helps others with similar problems.

First to my two proposed approaches:

  • Changing the SyntaxContext: far to complex if you need to handle Idents at unknown positions. This would require a visitor pattern which is able to modify the AST.
  • Parsing the whole function (using syn): Syn does not expose all subparsers necessary to build a modified function definition parser. One would need to fork syn and expose them manually—a large effort, especially from a maintainance point of view. Further I’m not interested in the remaning structure of the function definition—only in the parts contained in library’s other macros.

My current approach now is still to parse the function definition but ignore its structure whenever possible. Most of this is done by simple string processing and some invariants of the macro invocations. First all of the macros we are interested in act like statements or blocks. Second their expansions only yield statements. Third if the function itself has syntax errors or uses the macros incorrectly it does not matter if our macros are handled correctly (this might be bad for error reporting).

  1. Convert the token stream of the function def. into a string.
  2. Let the remaining string which is left to parse be the function definition.
  3. Find the next macro invocation we are interested in. In this case I know that they can only occur where a statement might occur.
    a. Search for the next substring which looks like a macro invocation.
    b. Check if the the macro invocation is not contained in a string or a comment, which is rather easy to do compared to the other options. Anything else should violate the invariants of the macro usage.
  4. Handle the macro invocation using using syn and quote (or by any other means)
  5. Stitch together the generated code and the unparsed part right to the macro invocation and go back to 3.
  6. Collect all generated parts and generate a single function definition then parsed as usual by the syntax extension.