How do you decide when to use procedural macros over declarative ones?

I'm entranced by creating procedural macros for custom derives. (#[derive(Enchant)])
Though declarative macros are useful too.

I'm curious. How do you decide when to use a procedural macro versus a declarative one?

my experience

In a use-case I am working through, I need to enforce that struct's serialize with the correct name. I do this by parsing allowed names from a file. This is wonderful because I can enforce program semantics at compile time without writing a compiler. So this seems like a good use-case for a procedural macro.

1 Like

Generally I first reach for a declarative macro.
If the input turns into a complex DSL, or if it uses token sequences that macro_rules! won't accept, or if it needs to exceed the boundaries of what a macro_rules! macro can do, then it's time for a procedural macro.

There is 1 exception, and that is if a macro_rules! macro works but is overly complex, I might at some point decide to go for a proc macro anyway as the hurdle of initial added complexity pays off in clearer generation code.

3 Likes

By default I'll always reach for a declarative macro. Once you learn a couple patterns and wrap your head around the recursive way a declarative macro is implemented it's often much easier to write a 50 line macro_rules! with a test or two (I normally add them to the doc-comments if it's public) than it is to create a new proc macro crate.

Some questions I ask myself when deciding between proc macros and declarative ones:

  • Is this a basic find/replace (e.g. implementing the same trait for different integer types or tuple lengths) or do I need some sort of conditional logic?
  • Do I need to track more "state" than a can be achieved with a simple pushdown accumulator?
  • Is this purely a syntactic operation or do I need to inspect a token's value and do logic with it?

You normally use procedural macros (e.g. custom derive or attribute) for generating code, but it probably makes sense to validate the name in the same place that generates code depending on its correctness.

Alternatively you could write an integration test which uses syn scans your source code for things with the #[derive(Enchant)] macro and checks that each item serializes with a valid name. The specifics will depend on your crate and how it is written, but you may find that sort of static analysis easier to do.

2 Likes

Thanks for the great advice! Start small.

These are great questions and help me form a heuristic.
I've heard of the little book of Rust macros but not read it. It looks quite informative. I'm curious about that "simple pushdown accumulator" topic.

As I learn procedural macros I realize my initial use-case is backwards. My primary goal is generating Rust data structures from an XML Schema Definition file for WordprocessingML (Microsoft Word). I'm only require a subset of the schema but need each struct to implement Write using the correctly namespaced tag. e.g. w:t for a struct named T.

So I am shifting my approach. Rather than use the proc macro to validate existing Rust structs, why not generate them directly from a parsed XSD with the correctly specified tag names? Practically I'm using strong-xml to avoid the work of implementing custom writers.

That is a clever idea! I will keep that in mind as an option.

And as an aside on working with WordprocessingML and Rust, I've been finding serde and XML are not a great fit. I noted more details in a closed serde-xml-rs issue.

Practical rules

  1. Try to use declarative macros when possible: they are more lightweight, compile-time wise (and usually, development-time-wise too).

  2. But do not push that too hard: the moment you start munching tokens (c.f. Incremental TT munchers), you may be hitting the limits of ergonomic macro_rules! macros. It's a slippery slope that can lead to maintainability hell (for instance, good luck trying to add support for trailing commas to this code!)

    Note that you can be clever about some of these things if you forego a bit of "nice-looking syntax". Parsing generic parameters, for instance, is awfully difficult with declarative macros, even if you are just looking to identify the <…> blob in an opaque fashion to forward it afterwards. But this very thing can be easily achieved if you decide that writing generic parameters for your macro input shall be using […] instead of <…>. Then, suddenly, parsing the generic parameters as an opaque blob is trivial: [$($generics:tt)*].

Main cases where declarative macros are not enough and people reach for proc-macros:

  • #[attr] & #[derive(Derive)]` syntax / call-site

    This one is quite saddening, to be honest, but currently and for the foreseeable future declarative macros can only use function-like call syntax (macro_name!( … )), and not attribute syntax (#[macro_name]) nor derive syntax #[derive(MacroName)].

    • Workaround – macro_rules_attribute

      • #[macro_rules_attribute(macro_name!)]

      • #[macro_rules_derive(MacroName!)]

  • Parsing generics parameters and where clauses

    Not much to be said, there, besides repeating what I've said in 2.

  • Inspecting input literals and process them (e.g., perform compile-time parsing and validation)

    • Examples

    • Workaround – const fns

      The evolution of const fn capabilities have rendered most of these crates obsolete, and even too restricted (macros can only operate off syntax, and thus, inspect literals; they cannot inspect the value of constants, or of things returned by macros (taking the hex! macro example mentioned just above, one cannot do hex!(include_str!("some/file")).

      Here is an example featuring a macro for &'static CStr literal implemented using declaratiev macros and const shenanigans exclusively: Playground

  • Concatenating identifiers / forging identifiers

    e.g., if you have foo and bar which have been fed to the macro, you'd like to define a foobar item.

  • Having a 1 .. N kind of range to iterate over

    e.g., say you want to define an enum with Variant1, Variant2, …, Variant256 if the user provides the input 256.

    • Workarounds

      • ::seq_macro::seq! for the cases where this is still needed;

      • const generics to be general over arrays of any length :slight_smile:

  • Being a non-pure macro, such as one interacting with the file system

    Note that proc-macros that do that (usually to perform a build.rs's script job) are considered hacky or unidiomatic since true access to the file system and other environment parameters are not guaranteed for proc-macros. That being said, the existence of the include…! family of macros in the std library does suggest that some macros of this form could be acceptable. The very bad thing are macros that abuse static storage to keep state in between invocations. Such macros are very likely to stop working and thus break with a compiler update.

8 Likes

The only proc macro I've ever written falls into this category: I needed a macro to generate unique identifiers that wouldn't collide with each other, so I wrote a macro that uses the host's entropy pool to constuct new, random UUIDs.

Every other macro I've needed has been relatively comfortable as a declarative one.

2 Likes

What's the crate name? I've been thinking about writing that kind of crate / macro for a while :grinning_face_with_smiling_eyes:

I never ended up publishing it to crates.io. The repository is here; it generates the ids as typenum integers.

1 Like

This is pretty common and exactly what a build script is for.

Yeah, serde works well when there is a single unambiguous way to do things and the data model is very close to JSON.

XML is a lot more feature-rich and a lot of people would either use automatically generated parsing code (either literally generated or done via reflection) that deserializes documents into objects from a language like Java, or they would interpret it procedurally (e.g. by pulling events from a parser).

1 Like

Oh, good to know. Do you have any examples off the top of your head? Or any links that expand the rationale? I was reading through the build script documentation recently but am fuzzy on the tradeoffs between using build.rs and a proc macro.

A build script is intended for when you need to run some code before your crate gets compiled. This is often where you'll do things like compiling a native DLL so it can be linked into your crate or reading a file to generate some Rust code that gets include!()d (think protobufs).

On the other hand, procedural macros are intended for when you need to do transformations on a particular piece of syntax. They are meant to be pure functions which only read the token stream they are given and don't touch the file system or make calls out to the internet.

You can do impure things inside proc macros like reading from the file system or interact with the network (that's how crates like include_dir, sqlx, and diesel work), but that isn't endorsed by the language. It's not directly prohibited by the compiler, though.

1 Like

Just to clarify that: Diesel stopped doing such things quite a long time. proc macros provided by diesel itself only generate code based on the struct/type/… they are applied to. There are no other sources used. There is only one thing that could be seen as exception here, that's the embed_migration! macro. It emits include_str! calls internally, but is based on the content of a user provided of a migration directory.

1 Like