Nested/Recursive Macro Help for a simple Assembler macro

Hello all, I am trying to build a simpler assembler macro so that I can do something along the lines of:

let instructions: Vec<u32> = assemble_mips![
    add 5, 5, 5
    sll 5, 1, 1
];

I think I need to use a recursive macro for this but I didn't get far beyond:

macro_rules! assemble_mips {
    () => {
        {}
    };
    (sll $rt:expr, $rd:expr, $sa:expr) => {
        {
            v.push(
                ($rt & 0b11111) << 15
                | ($rd & 0b11111) << 10
                | ($sa & 0b11111) << 5
            );
        }
    };

    ( $a:pat, $($x:tt)* ) => {
        {
            let mut v: Vec<u32> = Vec::new();

            assemble_mips! $a
            assemble_mips!($($x)*)

            v
        }
    };
}

Which doesn't work. I think $a:pat is wrong but I don't know what the type should be. Can someone point me in the right direction if this is even possible? Thank you!

You would probably have better luck with a helper macro that the macro calls for each line.

1 Like

If the length of the instructions is "variadic" in tokens (which it probably is), then in order to keep your current syntax you'd have to mix parsing instructions with recursing, which will make the instructions part a bit less readable. I thus recommend that you implement your macro in two passes:

  1. Split all the instructions into handling individual instructions, thanks to a global separator (e.g., trailing ; for each instruction, or have the instructions be […] wrapped);

  2. Have another macro to handle each individual instruction.

[add 5, 5, 5 ] syntax

Now, for macro reasons, using […] as the syntactical separator in between instructions will be the easiest to implement (no need to munch/recurse), so let's start with that approach to get the gist:

macro_rules! assemble_mips {(
 $(
    // a single instruction, with opaque contents
    [ $($instruction:tt)* ]
 )*
) => ({
    let mut v = ::std::vec![];
    macro_rules! __assembled_bytes {() => ( v )} // circumvents hygiene
 $(
    $crate::macros::__assemble_single_mips_instruction!( $($instruction)* );
 )*
    v
})}
pub(crate) use assemble_mips;

and from there, you can now write:

macro_rules! __assemble_single_mips_instruction {
    ( sll $rt:expr, $rd:expr, $sa:expr ) => (
        __assembled_bytes!().push(
            0
            | ($rt & 0b11111) << 15
            | ($rd & 0b11111) << 10
            | ($sa & 0b11111) << 5
        );
    );
    ( add … ) => ( … );
    // etc.
}
pub(crate) use __assemble_single_mips_instruction;

add 5, 5, 5; syntax

Click to see

This one will be a bit more annoying, since we'll need to use recursive :tt-munching to locate the ; and split there. The previous macro will be handy, and will thus remain available, named $crate::macros::__assemble_mips_bracketed!, within the following section:

macro_rules! assemble_mips {( $($input:tt)* ) => (
    $crate::macros::__split_semicolons! {
        instructions: []
        current_instruction: []
        munching: [ $($input)* ]
    }
)}
pub(crate) use assemble_mips;

#[doc(hidden)] /** Not part of the public API */ #[macro_export]
macro_rules! __split_semicolons {
    (
        instructions: [
            $($instructions:tt)*
        ]
        current_instruction:
            $current_instruction:tt
        munching: [
            ; // When finding a `;` <====================
            $($rest:tt)*
        ]
    ) => ($crate::macros::__split_semicolons! {
        instructions: [
            $($instructions)*
            $current_instruction // add it
        ]
        current_instruction: []
        munching: [
            $($rest)*
        ]
    });

    (
        instructions:
            $instructions:tt
        current_instruction: [
            $($current_instruction:tt)*
        ]
        munching: [
            $cur_tt:tt // When finding something else <============
            $($rest:tt)*
        ]
    ) => ($crate::macros::__split_semicolons! {
        instructions:
            $instructions
        current_instruction: [
            $($current_instruction)*
            $cur_tt
        ]
        munching: [
            $($rest)*
        ]
    });

    (
        instructions: [
            $($instructions)*
        ]
        current_instruction: []
        munching: [
            /* Nothing left! */
        ]
    ) => ($crate::macros::__assemble_mips_bracketed! {
        $($instructions)*
    });
}
pub(crate) use __split_semicolons;
1 Like

Okay thats an interesting way to do it, thanks for the help! Glad it at least wasn't something simple and obvious (unless this solution is to people here!)

1 Like

The simplest you could get would only be by intertwining the recursing with the instruction assembling:

macro_rules! assemble_mips {( $($input:tt)* ) => ({
    let mut v = ::std::vec![];
    macro_rules! __assembled_bytes {() => ( v )} // circumvents hygiene
    $crate::macros::__assemble_mips_into_assembled_bytes!( $($input)* );
    v
})}
pub(in crate) use assemble_mips;

macro_rules! __assemble_mips_into_assembled_bytes {
    () => ();

    (
        sll $rt:expr, $rd:expr, $sa:expr,
        $($rest:tt)*
    ) => (
        __assembled_bytes!().push(
            0
            | ($rt & 0b11111) << 15
            | ($rd & 0b11111) << 10
            | ($sa & 0b11111) << 5
        );
        $crate::macros::__assemble_mips_into_assembled_bytes!( $($rest)* );
    );

    // etc.
}
pub(in crate) use __assemble_mips_into_assembled_bytes;

And that's it!

But despite it being simpler to write, it has a lot of that $($rest:tt)* -> recurse!($($rest)*); repetition (it will occur once per instruction rule!) which, given the expected big number of supported instructions, would not pay off in practice.

It that's practical task (and not an exercise of learning Rust macro language) then I strongly recommend to look on procedural macro approach in general and dynasm crate in particular (currently it doesn't support MIPS, but it would be easier to add it than to write something like that from scratch).

Macro-by-example Rust's facility is much stronger than many other languages have but they are still not powerful enough to build ergonomic DSL.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.