Building a derive macro for bitmask logic: am I doing anything wrong here?

Hey everyone,

I've been experiementing with a derive macro that turns struct fields into bit mask predicates for fast, branchless rule evaluation. Its novel territory for me, and I'd really like to improve. I haven't found great resources for these sans the Gjenset series.
Any direction would be great

A great book on the subject is Write Powerful Rust Macros by Sam Van Overmeire.

I had a quick look only, but I'd say:

Firstly, write some doc that says what the macro does. I see there's a link to an article, but the readme is otherwise empty. So as a reviewer or a user, I'd have to start exploring all the code to understand what does what (I haven't seen any obvious unit test, either). I think that'd be a good starting point. :wink:

Secondly, you might find it clearer to separate the code working on proc_macro from the code working on proc_macro2, instead of specifying them manually for TokenStream or, as you did, renaming the latter as TokenStream2.

Something like this:

  • lib.rs
use proc_macro::TokenStream;
use XXX::my_inner_top;

#[proc_macro_derive(KitchenNightmares, attributes(yuck))]
pub fn derive_yuck_facts(input: TokenStream) -> TokenStream {
    my_inner_top(input.into()).into()
}
  • another module XXX
use proc_macro2::TokenStream;

fn my_inner_top(input: TokenStream) -> TokenStream {
    // what was in derive_yuck_facts (more or less)
}

This has the added advantage—or main advantage—that you can make unit tests for my_inner_top by feeding it a proc_macro2::TokenStream from a quote or from parsing strings.

1 Like

I appreciate you taking the time and going through it and giving me some protips. I also thank you for the macro resource.

I added a readme and abstracted the logic into it's own function, and it is indeed cleaner.

It'll be a bit before I add unit testing, my personal project eats a ton of time. I will include testing in the future as it adds information, I just didn't think of these things at the time since this was just an example repo for a solution I'm implementing personally.

1 Like

You kept proc_macro::TokenStream in your inner function, though, which completely defeats the purpose and isn't any clearer, so maybe my comment and its example weren't clear enough. It's only a suggestion, so it might not be relevant for small macros. It's really up to you. :slight_smile:

EDIT: Maybe the problem you met is parse_macro_input!, which takes a proc_macro::TokenStream. You can use this instead:

use proc_macro2::{Span, TokenStream};
use syn::parse2;
// ...

pub fn derive_yuck_tokens(input: TokenStream) -> TokenStream {
    let input: DeriveInput = parse2(input).unwrap();
    // ...
    expanded // into() not necessary any more
}

That allows you to send your own test streams to this top function, or use it from other macros (if that's relevant; it might not).

Note that ugly unwrap() I wrote. If you look at the macro you were using, it processes the errors. From the macro expansion, or its doc comments:

match syn::parse::<$Type>($variable) {
    Ok(syntax_tree) => syntax_tree,
    Err(err) => return proc_macro::TokenStream::from(err.to_compile_error()),
}

So maybe you should return a compile_error(...) instead, in case the result of parse2 is an error.

Another way to handle the parsing is to move the line

let input = parse_macro_input!(input as DeriveInput);

into derive_kitchen_nightmares and have derive_yuck_tokens take input: DeriveInput. In that case, I'm not entirely sure how to feed it with test streams, but you can still share this with other derives if that's relevant.

It's just to show there are alternatives available, but don't feel forced to move anything if you don't need it. I think those are more like general tips than something that will benefit an example like this.

Apart from that, I'm a little on the fence regarding the shadowing of input with a different type, but it's still the input with all the information, after all. But it generally looks good to me, and it handles errors, which is nice. I didn't look in depth, so for what it's worth.

One last thing about the book I mentioned earlier: it's very good and just the right length for the topic, I think, but it doesn't mention the VisitMut approach (and maybe another one), although it would have simplified one of the chapters. I understand that it would have required to introduce yet another concept, so that's why the author didn't use it.

I don't think it applies to your example above, which seems to use the easiest method.

1 Like

I was confused, and I appreciate you taking the time to spread some knowledge. I ended up going with the second approach in my code:

pub fn derive_yuck_tokens(input: TokenStream2) -> TokenStream2 {
    let ast: DeriveInput = match parse2(input) {
        Ok(x) => x,
        Err(e) => return compile_error(Span::call_site(), &format!("invalid derive input: {e}")),
    };
    generate_for_kitchen(&ast)
}

then I put the ast into the heavier logic into generate_for_kitchen.