Macros (and syntax extensions and compiler plugins) - where are we at?

Procedural macros are one of the main reasons Rust programmers use nightly rather than stable Rust, and one of the few areas still causing breaking changes. Recently, part of the story around procedural macros has been coming together and here I'll explain what you can do today, and where we're going in the future.

TL;DR: as a procedural macro author, you're now able to write custom derive implementations which are on a fast track to stabilisation, and to experiment with the beginnings of our long-term plan for general purpose procedural macros. As a user of procedural macros, you'll soon be saying goodbye to bustage in procedural macro libraries caused by changes to compiler internals.

Macros today

Macros are an important part of Rust. They facilitate convenient and safe functionality used by all Rust programmers, such as println! and assert!; they reduce boilerplate, and make implementing traits trivial via derive. They also allow libraries to provide interesting and unusual abstractions.

However, macros are a rough corner - declarative macros (macro_rules macros) have their own system for modularisation, a fiddly syntax for declarations, and some odd rules around hygiene. Procedural macros (aka syntax extensions, compiler plugins) are unstable and painful to use. Despite that, they are used to implement some core parts of the ecosystem, including serialisation, and this causes a great deal of friction for Rust users who have to use nightly Rust or clunky build systems, and either way get hit with regular upstream breakage.

Future of procedural macros

We strongly want to improve this situation. Our current priority is procedural macros, and in particular the parts of the procedural macro system which force Rust users onto nightly or cause recurring upstream errors.

Our goal is an expressive, powerful, and well-designed system that is as stable as the rest of the language. Design work is ongoing in the RFC process. We have accepted RFCs on naming and custom derive/macros 1.1, there are open RFCs on the overall design of procedural macros, attributes, etc., and probably several more to come, in particular about the libraries available to macro authors.

The future, today!

One of the core innovations to the procedural macro system is to base our macros on tokens rather than AST nodes. The AST is a compiler-internal data structure; it must change whenever we add new syntax to the compiler, and often changes even when we don't due to refactoring, etc. That means that macros based on the AST break whenever the compiler changes, i.e., with every new version. In contrast, tokens are mostly stable, and even when they must change, that change can easily be abstracted over.

We have begun the implementation of token-based macros and today you can experiment with them in two ways: by writing custom derive implementations using macros 1.1, and within the existing syntax extension framework. At the moment these two features are quite different, but as part of the stabilisation process they should become more similar to use, and share more of their implementations.

Even better for many users, popular macro-based libraries such as Serde are moving to the new macro system, and crates using these libraries should see fewer errors due to changes to the compiler. Soon, users should be able to use these libraries from stable Rust.

Custom derive (macros 1.1)

The derive attribute lets programmers implement common traits with minimal boilerplate, typically generating an impl based on the annotated data type. This can be used with Eq, Copy, Debug, and many other traits. These implementations of derive are built in to the compiler.

It would be useful for library authors to provide their own, custom derive implementations. This was previously facilitated by the custom_derive feature, however, that is unstable and the implementation is hacky. We now offer a new solution based on procedural macros (often called 'macros 1.1', RFC, tracking issue) which we hope will be on a fast path to stabilisation.

The macros 1.1 solution offers the core token-based framework for declaring and using procedural macros (including a new crate type), but only a bare-bones set of features. In particular, even the access to tokens is limited: the only stable API is one providing conversion to and from strings. Keeping the API surface small allows us to make a minimal commitment as we continue iterating on the design. Modularization and hygiene are not covered, nevertheless, we believe that this API surface is sufficient for custom derive (as evidenced by the fact that Serde was easily ported over).

To write a macros 1.1 custom derive, you need only a function which takes and returns a proc_macro::TokenStream, you then annotate this function with an attribute containing the name of the derive. E.g., #[proc_macro_derive(Foo)] will enable #[derive(Foo)]. To convert between TokenStreams and strings, you use the to_string and parse functions.

There is a new kind of crate (alongside dylib, rlib, etc.) - a proc-macro crate. All macros 1.1 implementations must be in such a crate.

To use, you import the crate in the usual way using extern crate, and annotate that statement with #[macro_use]. You can then use the derive name in derive attributes.

Example

(These examples will need a pretty recent nightly compiler).

Macro crate (b.rs):

#![feature(proc_macro, proc_macro_lib)]
#![crate_type = "proc-macro"]

extern crate proc_macro;

use proc_macro::TokenStream;

#[proc_macro_derive(B)]
pub fn derive(input: TokenStream) -> TokenStream {
    let input = input.to_string();
    format!("{}\n impl B for A {{ fn b(&self) {{}} }}", input).parse().unwrap()
}

Client crate (client.rs):

#![feature(proc_macro)]

#[macro_use]
extern crate b;

trait B {
    fn b(&self);
}

#[derive(B)]
struct A;

fn main() {
    let a = A;
    a.b();
}

To build:

rustc b.rs && rustc client.rs -L .

When building with Cargo, the macro crate must include proc-macro = true in its Cargo.toml.

Note that token-based procedural macros are a lower-level feature than the old syntax extensions. The expectation is that authors will not manipulate the tokens directly (as we do in the examples, to keep things short), but use third-party libraries such as Syn or Aster. It is early days for library support as well as language support, so there might be some wrinkles to iron out.

To see more complete examples, check out derive(new) or serde-derive.

Stability

As mentioned above, we intend for macros 1.1 custom derive to become stable as quickly as possible. We have just entered FCP on the tracking issue, so this feature could be in the stable compiler in as little as 12 weeks. Of course we want to make sure we get enough experience of the feature in libraries, and to fix some bugs and rough edges, before stabilisation. You can track progress in the tracking issue. The old custom derive feature is in FCP for deprecation and will be removed in the near-ish future.

Token-based syntax extensions

If you are already a procedural macro author using the syntax extension mechanism, you might be interested to try out token-based syntax extensions. These are new-style procedural macros with a tokens -> tokens signature, but which use the existing syntax extension infrastructure for declaring and using the macro. This will allow you to experiment with implementing procedural macros without changing the way your macros are used. It is very early days for this kind of macro (the RFC hasn't even been accepted yet) and there will be a lot of evolution from the current feature to the final one. Experimenting now will give you a chance to get a taste for the changes and to influence the long-term design.

To write such a macro, you must use crates which are part of the compiler and thus will always be unstable, eventually you won't have to do this and we'll be on the path to stabilisation.

Procedural macros are functions and return a TokenStream just like macros 1.1 custom derive (note that it's actually a different TokenStream implementation, but that will change). Function-like macros have a single TokenStream as input and attribute-like macros take two (one for the annotated item and one for the arguments to the macro). Macro functions must be registered with a plugin_registrar.

To use a macro, you use #![plugin(foo)] to import a macro crate called foo. You can then use the macros using #[bar] or bar!(...) syntax.

Example

Macro crate (foo.rs):

#![feature(plugin, plugin_registrar, rustc_private)]
#![crate_type = "dylib"]

extern crate proc_macro_plugin;
extern crate rustc_plugin;
extern crate syntax;

use proc_macro_plugin::prelude::*;
use syntax::ext::proc_macro_shim::prelude::*;

use rustc_plugin::Registry;
use syntax::ext::base::SyntaxExtension;


#[plugin_registrar]
pub fn plugin_registrar(reg: &mut Registry) {
    reg.register_syntax_extension(token::intern("foo"),
                                  SyntaxExtension::AttrProcMacro(Box::new(foo_impl)));
    reg.register_syntax_extension(token::intern("bar"),
                                  SyntaxExtension::ProcMacro(Box::new(bar)));
}

fn foo_impl(_attr: TokenStream, item: TokenStream) -> TokenStream {
    let _source = item.to_string();
    lex("fn f() { println!(\"Good bye!\"); }")
}

fn bar(_args: TokenStream) -> TokenStream {
    lex("println!(\"Hello!\");")
}

Client crate (client.rs):

#![feature(plugin, custom_attribute)]

#![plugin(foo)]

#[foo]
fn f() {
    println!("Hello world!");
}

fn main() {
    f();

    bar!();
}

To build:

rustc foo.rs && rustc client.rs -L .

Stability

There is a lot of work still to do, stabilisation is going to be a long haul. Declaring and importing macros should end up very similar to custom derive with macros 1.1 - no plugin registrar. We expect to support full modularisation too. We need to provide, and then iterate on, the library functionality that is available to macro authors from the compiler. We need to implement a comprehensive hygiene scheme. We then need to gain experience and confidence with the system, and probably write some more RFCs.

However! The basic concept of tokens -> tokens macros will remain. So even though the infrastructure for building and declaring macros will change, the macro definitions themselves should be relatively future proof. Mostly, macros will just get easier to write (so less reliance on external libraries, or those libraries can get more efficient) and potentially more powerful.

We intend to deprecate and remove the MultiModifier and MultiDecorator forms of syntax extension. It is likely there will be a long-ish deprecation period to give macro authors opportunity to move to the new system.

Declarative macros

This post has been focused on procedural macros, but we also have plans for declarative macros. However, since these are stable and mostly work, these plans are lower priority and longer-term. The current idea is that there will be new kind of declarative macro (possibly declared using macro! rather than macro_rules!); macro_rules macros will continue working with no breaking changes. The new declarative macros will be different, but we hope to keep them mostly backwards compatible with existing macros. Expect improvements to naming and modularisation, hygiene, and declaration syntax.

Hat-tips

Thanks to Alex Crichton for driving, designing, and implementing (which, in his usual fashion, was done with eye-watering speed) the macros 1.1 system; Jeffrey Seyfried for making some huge improvements to the compiler and macro system to facilitate the new macro designs; Cameron Swords for implementing a bunch of the TokenStream and procedural macros work; Erick Tryzelaar, David Tolnay, and Sean Griffin for updating Serde and Diesel to use custom derive, and providing valuable feedback on the designs; and to everyone who has contributed feedback and experience as the designs have progressed.

26 Likes

Diesel now uses macros 1.1 custom derive now too - Release v0.8.0 · diesel-rs/diesel@04bf2fc · GitHub (literally minutes too late for the post).

If Nick's two-line example using format! is a little too simplistic to get you interested, or makes it seem like a huge pain to develop macros that way (you would be right), check out Syn and Quote which are two libraries I developed during Serde's transition to the new macro system.

Syn is a parsing library for taking the TokenStream input and turning it into a usable syntax tree, and quote is a quasiquoting library for constructing the output TokenStream in a friendly way.

Both libraries are designed for fast compile time. As part of moving Serde to these, the total compile time including dependencies dropped by 40% (compared to the old compiler plugin, which used to be considered "fast" compared to the Syntex-based one).

See the syn readme for a concise but complete example that involves generating code based on the fields of a struct, including support for structs with arbitrarily complicated generic type parameters.

5 Likes

Could someone explain how being based on tokens will improve the forward-compatibility of those procedural macros?

As @dtolnay is demonstrating above, Serde is using its own AST parser... which to me means that new syntax elements (such as -> impl Trait) may break the parser and prevent using Serde, and require an update of Syn + an upgrade of Serde to work again.

Did I got it wrong?

Using the compiler's AST (this is the world we lived in before Macros 1.1):

  • @dtolnay implements serde_derive that works with nightly-2016-10-10.
  • @matthieum uses it and loves it.
  • Compiler makes a breaking change to its AST (happens all the time between nightlies).
  • The previously working serde_derive no longer compile on nightly-2016-10-11. This is second-worst kind of breakage (after stable-to-stable breakage).
  • @matthieum is upset and files a ticket against Serde.
  • @dtolnay updates serde_derive to the new AST and releases a minor version bump.
  • Some other @matthieum2 is upset because the new serde_derive doesn't compile on nightly-2016-10-10, files a ticket against Serde.
  • We tell @matthieum2 to rustup update.

Using syntex or syn for the AST:

  • @dtolnay implements serde_derive against syn 0.9.0.
  • Compiler makes a breaking change to its AST, for example -> impl Trait.
  • Nobody is upset. All previously working code continues to work.

Meanwhile:

  • We update syn to be able to parse the new syntax and release it as 0.9.1 or 0.10.0 depending on what the change is. Note that (speaking as the person who has been maintaining Syntex for the past couple months) I expect this to be way easier than dealing with libsyntax merge conflicts, so support for new syntax will be more timely than it has been in the past with Syntex.
  • The next minor version bump of serde_derive picks up syn 0.10.0. Typically this will happen well before people actually start using the new syntax in their crates, possibly even before the new syntax is implemented in rustc.

The difference is that with the compiler's AST @matthieum is upset and with syn's AST @matthieum is never upset; serde_derive supports the new syntax in a minor version bump so it automatically works when @matthieum starts trying to use it.

7 Likes

If you're trying to make a simple #[derive(Foo)] which just needs to implement some methods which perform some operation on every field, or you want to be able to easily treat Structs and Enums uniformly, you might find my synstructure (https://github.com/mystor/synstructure) crate useful as well. It just provides some helper methods for generating match statements which allow you to write code which just acts on fields, instead of on structs/enums.

It's built (surprise) on top of the excellent syn, and I made it because I noticed that my abomonation_derive and gc_derive plugins both just wanted to perform an operation on every field, and it felt cumbersome to write code which did that, despite it being conceptually simple.

2 Likes

Did you know that quote! supports repetition? If you have a collection that implements FromIterator, you can do something like the following:

  let a = vec![1, 2, 3];
  let b = vec!["1", "2", "3"];

  quote! {
      match 3 {
        #(#a => println!("{:?}", #b)),*
      }
  }
4 Likes

How do you use quote with syntax::TokenStream ?

This convoluted way:

  • quote!(...) gives you back a quote::Tokens
  • quote::Tokens implements Display, call to_string to get a String
  • parse the String into a Vec<tokenstream::TokenTree> using parse_tts_from_source_str (in libsyntax/parse/mod.rs)
  • turn the Vec<TokenTree> into tokenstream::Delimited by using token::DelimToken::NoDelim as the delimiter
  • turn the Delimited into a syntax::TokenTree::Delimited
  • use the impl From<TokenTree> for TokenStream
1 Like

Thanks, for the people of the future this is what I've ended up doing:

   let tokens: quote::Tokens = quote! { ... }; 
   {  // do the dance:
        let sess: ParseSess = ParseSess::new();
        let v: Vec<TokenTree>
            = parse_tts_from_source_str("some_name".to_string(), 
                                        tokens.to_string(), &sess).unwrap();
        TokenStream::as_delimited_stream(v, token::DelimToken::NoDelim);
    }

I still get an error of the form "macro expansion ignores token and any following" (the quote::Tokens look correct, so maybe I am screwing it up in the conversion to TokenStream), but at least it type checks.

EDIT: the solution was to use TokenStream::from_tts(v) instead of TokenStream::as_delimited_stream(v, token::DelimToken::NoDelim) :slight_smile:

3 Likes