Macro miscompilation? Type error when expanding macro

You can skip to the Macro Implementation below.
If you're really busy, just look at /****** LOOK HERE ******/ part in the macro_rules! __race_job and skip to the last section - The Problem.

Background information

I wanted to make a race<T>(job: Vec<RaceJob<T>>) -> (usize, T) function that does the following:

  1. Takes vector of closures
  2. Run all closures at once
  3. Wait until any closure ends, then take the result from the fastest closure
  4. Stop other closures
  5. Return the fastest closure's index and its result.

To stop other closures, RaceJob<T> is defined as &AtomicBool -> Option<T> function.
RaceJob is responsible for checking AtomicBool's value.

  1. If the AtomicBool's value is true, RaceJob should stop and return None.
  2. If the AtomicBool remained false until the job ends, RaceJob should set the AtomicBool's value to true to indicate the job is finished. After that, RaceJob returns Some(T). (This is done by compare_exchange which uses CAS operation.)

Now race can just create a AtomicBool value, spawn threads using AtomicBool and RaceJob, check if thread returned Some(T) (only the fastest thread can return it), then unwrap its value.

(Some traits are snipped for simplicity, exact definitions are type RaceJob<T> = Box<dyn FnOnce(&AtomicBool) -> Option<T> + Send + 'static> and fn race<T: Send + 'static>(jobs: Vec<RaceJob<T>>) -> (usize, T).)

Macro Implementation

To make it easier for users to use the race<T>(job: Vec<RaceJob<T>>) -> (usize, T) function, I've implemented the race! and __race_job! macro.
Ideally, instead of manually creating vector, boxes, and closures, user can just write list of closure bodies.

For starters, we have a race! macro.
race! macro takes body of closures, then create vector and boxes that wraps closures.

macro_rules! race {
    ( $( { $($body:tt)* } ),+ $(,)? ) => {{
        $crate::race(vec![
            $(
                Box::new({
                    use std::sync::atomic::{ AtomicBool, Ordering };

                    move |__is_finished: &AtomicBool| {
                        $crate::__race_job![ __is_finished; $($body)* ]
                    }
                })
            ),+
        ])
    }};
}

Specifically, race![ { code11; code12; code13 }, { code21; code22 } ] will turn into:

{race(vec![
    Box::new({
        use std::sync::atomic::{ AtomicBool, Ordering };

        move |__is_finished: &AtomicBool| {
            __race_job![ __is_finished;
                code11; code12; code13 // original code
            ]
        }
    }),
    Box::new({
        use std::sync::atomic::{ AtomicBool, Ordering };

        move |__is_finished: &AtomicBool| {
            __race_job![ __is_finished;
                code21; code22 // original code
            ]
        }
    })
])}

I know that the outmost bracket is unnecessary, but I added just in case.

We also have a __race_job! macro.
__race_job! macro will insert code that handles AtomicBool between statements.

macro_rules! __race_job {
    // recursion base case; if ends without expr, return ()
    ( $flag:ident; ) => {
        if $flag
            .compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire)
            .is_ok()
        {
            Some(())
        } else {
            None
        }
    };

    // another recursion base case; if ends with expr, return its value
    ( $flag:ident; $last:expr ) => {{
        if $flag.load(Ordering::Acquire) {
            return None;
        }
        let __check_result = $last;

        if $flag
            .compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire)
            .is_ok()
        {
            Some(__check_result)
        } else {
            None
        }
    }};

    // if statement exists, insert checking code in the middle
    ( $flag:ident; $head:stmt; $($tail:tt)* ) => {{
        if $flag.load(Ordering::Acquire) {
            return None;
        }

        /****** LOOK HERE ******/
        $head;
        /****** LOOK HERE ******/

        $crate::__race_job![ $flag; $($tail)* ]
    }};
} 

Specifically, __race_job![ __is_finished; code11; code12; code13 ] will turn into:

if __is_finished.load(Ordering::Acquire) { return None; }
code11; // original code
{
    if __is_finished.load(Ordering::Acquire) { return None; }
    code12; // original code
    {
        if __is_finished.load(Ordering::Acquire) { return None; }
        let __check_result = code13; // original code
        if __is_finished.compare_exchange(false, true,Ordering::AcqRel, Ordering::Acquire).is_ok() {
            Some(__check_result)
        } else {
            None
        }
    }
}

The Problem

From the definition, the following code:

let (id, winner) = race![
    {
        let mut x = 3; // stmt
        x += 5; // stmt

        if x == 3 {
            String::from("Add failed")
        } else {
            String::from("Add succeeded")
        } // expr
    },
    {
        let mut guess = String::new(); // stmt

        /****** LOOK HERE ******/
        io::stdin()
            .read_line(&mut guess)
            .expect("Failed to read line"); // stmt
        /****** LOOK HERE ******/

        guess //expr
    },
    {
        fn prepend_hello(name: String) -> String {
            format!("Hello, {name}!")
        }; // stmt

        let my_name = String::from("MyName"); // stmt
        prepend_hello(my_name) // expr
    }
];

should be expanded into:

let (id, winner) = {race(vec![
    Box::new({
        use std::sync::atomic::{AtomicBool, Ordering};

        move |__is_finished: &AtomicBool| {
            if __is_finished.load(Ordering::Acquire) { return None; }
            let mut x = 3;
            {
                if __is_finished.load(Ordering::Acquire) { return None; }
                x += 5;
                {
                    if __is_finished.load(Ordering::Acquire) { return None; }
                    let __check_result = if x == 3 {
                        String::from("Add failed")
                    } else {
                        String::from("Add succeeded")
                    };
                    if __is_finished.compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire).is_ok() {
                        Some(__check_result)
                    } else {
                        None
                    }
                }
            }
        }
    }),
    Box::new({
        use std::sync::atomic::{AtomicBool, Ordering};

        move |__is_finished: &AtomicBool| {
            if __is_finished.load(Ordering::Acquire) { return None; }
            let mut guess = String::new();
            {
                if __is_finished.load(Ordering::Acquire) { return None; }

                /****** LOOK HERE ******/
                io::stdin().read_line(&mut guess).expect("Failed to read line");
                /****** LOOK HERE ******/

                {
                    if __is_finished.load(Ordering::Acquire) { return None; }
                    let __check_result = guess;
                    if __is_finished.compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire).is_ok() {
                        Some(__check_result)
                    } else {
                        None
                    }
                }
            }
        }
    }),
    Box::new({
        use std::sync::atomic::{AtomicBool, Ordering};

        move |__is_finished: &AtomicBool| {
            if __is_finished.load(Ordering::Acquire) { return None; }
            fn prepend_hello(name: String) -> String {
                format!("Hello, {name}!")
            };
            {
                if __is_finished.load(Ordering::Acquire) { return None; }
                let my_name = String::from("MyName");
                {
                    if __is_finished.load(Ordering::Acquire) { return None; }
                    let __check_result = prepend_hello(my_name);
                    if __is_finished.compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire).is_ok() {
                        Some(__check_result)
                    } else {
                        None
                    }
                }
            }
        }
    }),
])};

In fact, this is the output of the cargo-expand crate, which expands every macro used in the Rust code.

From the definition of the __race_job! macro, we add a semicolon after the $head like this: $head;
Therefore, expanded code should also have io::stdin().read_line(&mut guess).expect("Failed to read line"); with a semicolon.

However, this code emits the following compile error:

error[E0308]: mismatched types
   --> src/main.rs:101:13
    |
101 | /             io::stdin()
102 | |                 .read_line(&mut guess)
103 | |                 .expect("Failed to read line"); // stmt
    | |                                              ^- help: consider using a semicolon here
    | |______________________________________________|
    |                                                expected `()`, found `usize`

For more information about this error, try `rustc --explain E0308`.

Since I'm already using a semicolon, I don't even understand why this error occurred.
I've tried running the code expanded by cargo-expand, and it works fine without any type errors.

Any idea how to convince the compiler that a semicolon should be added when expanding our macro?
You can see the full code in the playground.

1 Like

It seems like passing the tt to the inner macro causes it to eagerly find expressions, instead of searching further to check if it could be a statement instead.

Try adding a line 5; in the middle of the other branches, they fail in the same way.

The issue is 5; 6 can either be statement 5; + expr 6, or all as one expr 5; 6

I'm not sure how to eagerly find statements first.
It seems like your other cases got lucky, since val += 2; can't be an expr, has to be a statement.

2 Likes

You're right, adding 5; in the middle also makes compile error.

I thought only block expressions (e.g. {5; 6}) is considered as expr, but it seems 5; 6 is also considered as expr...

However, I'm still unsure how to solve the problem :frowning:

it seems the stmt fragment has some quirks when capturing/expanding an ExpressionStatement. I think after expansion, the type checker sees it as an expression a statement AST node. yet, it's too late in the compilation pipeline, far beyond the phase where the parser combines an expression followed by a semicolon into a statement. because it is a statement syntactically, you cannot append a semicolon to "discard" its value and convert it into the unit type. however, for some reason, the type checker expects it being the unit type, despite it's a statement, not an expression. it's worth noting that the semicolon is not part of the captured statement.

here's an reduced example demonstrate this behavior: Rust Playground

as a workaround for your particular problem, you can add a separate matcher rule for expression statements specifically. but you do get some code duplications in doing so:

macro_rules! __race_job {
  //...
  // try to match expression statements first
  ($flag:ident; $head:expr; $($tail:tt)*) => {{
    //...
  }};
  // then try to match other types of statements such as `let`
  ($flag:ident; $head:expr; $($tail:tt)*) => {{
    //...
  }};
}

modified version of the original code: Rust Playground

That's really weird... I thought macro expansion was just a string transformation.

  • $head:stmt matches with statement, but it ignores the trailing semicolon.
  • Macro does sth more than string transformation, so using $head will insert semicolon automatically.

ExpressionStatement (e.g. 5;) matches as a statement, but somehow these two rules creates weird effect - $head is interpreted as 5, which is then interpreted as an expression by the compiler..?

Anyway it solved the problem completely. Thanks for your help!

no, rust macros operate on typed AST-like nodes, it's mostly resembles the rust syntax, but not exactly matches the "real" syntax tree analyzed by the compiler.

no, the $head variable itself is a statement, it does need a trailing semicolon.

this is the part I'm not sure what exactly happened.

by tinkering around, it seems the compiler does treat ExpressionStatement somewhat differently than other types of statements, particularly, the macro expanded stmt fragment retains the type of the expression, but the type checker expects the type being (), despite it's a statement (i.e. for side effect only), not an expression. [1]


  1. you can see it's seen as a statement node instead of an expression node, because adding a semicolon does not "discard" the value of the expression and turn the expression into a statement ↩︎

$head is neither a statement nor an expression, it is a statement without a semicolon.
You can see its value by using println!("{}", stringify!($head));.

For example, LetStatement requires a ending semicolon, (i.e. let mut x = 3; is a statement, including the last semicolon)
but $head will contain a LetStatement without a semicolon. (i.e. $head value is let mut x = 3, without a semicolon.)
You can see this behavior by running this Playground code.

Of course, Rust does things more complicated than inserting semicolon at the end.
For example, Rust does not add semicolon after item declaration statement, such as fn prepend_hello(name: String) -> String { format!("Hello, {name}!") }.
However, this "statement without semicolon" thing somehow broke how ExpressionStatement works inside type checker... (I don't know Rust internals btw)

a metavariable captured as a stmt fragment is a statement. it's a parsed statement, represented as an abstract (syntax tree) node, not the original tokens.

it's important to know, once a fragment is captured, it became a single indivisible unit. it's no longer a bunch of tokens from the source code. what do you think this example will print? (try it on playground):

macro_rules! check_let_statement {
	(let $v:ident = $e:expr;) => {
		println!("let statement")
	};
	($s:stmt;) => {
		println!("not let statement")
	};
}

macro_rules! statement_wrapper {
	($s:stmt;) => {
		check_let_statement!($s;)
	};
}

fn main() {
	statement_wrapper!(let x = 42;);
}

in the next example, if macros were just text substitutions, it should not expand to valid code, because all the semicolons are removed. but it does compile (you can use the playground Expand Macros from the Tools button to see the expansion):

macro_rules! x3 {
	($s:stmt;) => {
		$s
		$s
		$s
	};
}
fn main() {
	x3!{let x = 42;}
}

the grammar (parsing rules) and the actual AST node representation are not the same. macro metavariables captures AST-like nodes.

for example, in the grammar rule, LetStatement must ends with a semicolon, but the parsed AST node doesn't have the semicolon.

the result is correct, but your reasoning is wrong.

stringify!() serialize the internal representation of the syntax nodes into a string representation. it doesn't guaranteed to reproduce the original source form of the fragment.

in this example, it doesn't contain a semicolon because the parsed AST node doesn't have the semicolon. nevertheless, it's still a statement.

it's not about the semicolon. ExpressionStatement is broken for a different reason I already explained in previous reply.

2 Likes

You wrote $head:stmt. That means it's a statement.

You don't need to know internals. It's clearly written in the documentation: When forwarding a matched fragment to another macro-by-example, matchers in the second macro will see an opaque AST of the fragment type. The second macro can’t use literal tokens to match the fragments in the matcher, only a fragment specifier of the same type. The ident , lifetime , and tt fragment types are an exception, and can be matched by literal tokens.

See also the documentation of:

which explains and offers a proc-macro-powered workaround for this (something which "unwraps the 'invisible parentheses' in which these metavar expansions end up wrapped").

Note, however, that such a tool ought only to be used as a last resort :warning:. Most notably, here, a mere :expr-else-:stmt handling by the macro_rules!, through a recursive muncher/parser (prepended if not already present), is the proper/nicer solution, as showcased by:

FWIW, here is a "generalized" version of the pattern, or at least, a template to make your own generalized versions, since it results in the following macro arm after which you can do anything you want:

( … $({ $($expr_or_stmt:tt)* })* … ) => (
    // do what you want here, for
    // each `$($expr_or_stmt)*` "batch"
);

Playground

  • Let's notice that expressions with a trailing comma are not handled well here (e.g., { sub_block… }, or match { … }, or if { … } $(else { … })?), and neither are item definitions (e.g., struct Helper { … })
1 Like