Rust macro for match on homebrew enum

To get right to the point I have a struct that effectively acts as an enum that I want to essentially write a for match in macro form. I'm writing a dynamic scripting language that I'm trying to be relatively performant so I'm trying a lot of tricks to get some extra speed.

The two primary structs of relevance are GcObj<T> and GcObject in both cases they effectively just wrappers around a NonNull<u8>. I do some layout logic but otherwise they are really just pointers to the objects. They are laid out like

[ ObjHeader { marked: AtomicBool, kind: ValueKind } | T ]

So just to get ahead of the question why don't you just use an enum?

  1. The issue is I generally need the struct that points or owns the object to exactly a single pointer wide. I'm using technique called NAN boxing where I essentially hide pointers inside a floating point number
  2. I want to avoid the space overhead of all enum variants being the size of the largest variant. My smallest variant is only 16 bytes, while my largest in 136 bytes, with the header being 16bytes itself I may be using 5x the amount of memory I need for the smallest variant.
  3. In some cases I know exactly what variant I'm looking at and don't want to pay to keep wrapping and unwrapping the enum.

With how I have things setup now this will be how I have to do my "matching" now

let obj: GcObject = get_obj();

match obj.kind() [
  ValueKind::Class => {
    let class: GcObj<Class> = obj.to_class();
    class.do_stuff();
  }
  ValueKind::Fun => {
    let fun: GcObj<Fun> = obj.to_fun();
    string.do_stuff();
  }
  // ...
}

Besides being a bit verbose nothing stops me from unwrapping to the wrong variant. This is were I think a macro could really help, but I'm so far only written relatively simple macros. Ideally I think it would be great if I could do something like this:

match!(match obj {
  ValueKind::Class(class) => class.do_stuff(),
  ValueKind::Fun(fun) => fun.do_stuff(),
  // ..
});

// maybe expands
match obj.kind() {
  ValueKind::Class => {
    let class: GcObj<Class> = obj.to_class();
    {
      class.do_stuff();
    }
  }
  ValueKind::Fun => {
    let fun: GcObj<Fun> = obj.to_fun();
    {
      string.do_stuff();
    }
  }
  // ..
}

It's going to be difficult to generate this method name using macros-by-example. If you can use the same method name for every case (for example .into() by implementing Into), this looks quite doable.

This looks more like a job for <dyn Any>::downcast(), doesn't it?

macro_rules! to_value_kind {
    ($o:expr, Class) => { $o.to_class() };
    ($o:expr, Fun) => { $o.to_fun() };
}

// A tt muncher to handle parsing of {} after an expression.
macro_rules! match_obj_tt_muncher {
    // This is the final case matched, when all the tokens have been moved from the arms to the
    // scrutinee.
    (
        scrutinee = ($($scrutinee:tt)*)
        arms = ({
            $(ValueKind::$value_kind:ident($p:pat) => $e:expr,)*
        })
    ) => {{
        let object: GcObject = $($scrutinee)*;
        match object.kind() {
            $(ValueKind::$value_kind => {
                let $p: GcObj<$value_kind> = to_value_kind!(object, $value_kind);
                $e
            })*
        }
    }};

    // Otherwise, we move a token to the scrutinee.
    (
        scrutinee = ($($scrutinee:tt)*)
        arms = ($extracted:tt $($rest:tt)*)
    ) => {
        match_obj_tt_muncher! {
            scrutinee = ($($scrutinee)* $extracted)
            arms = ($($rest)*)
        }
    };
}


macro_rules! match_obj {
    (match $($rest:tt)*) => {
        match_obj_tt_muncher!(
            scrutinee = ()
            arms = ($($rest)*)
        )
    };
}

Rust's macro_rules! won't allow you to parse a {} after an expression as that is ambiguous with struct expressions (e.g. SomeStruct {}), so we get around this by using a tt muncher - although there may be a way to avoid that.

This has the slight limitation that it doesn't allow omitting the comma after match arms that use {}, unlike regular matches. i.e. this won't compile:

match_obj!(match object {
    ValueKind::Class(_) => {}
    ValueKind::Fun(fun) => {}
});

When it would with a usual match. This can probably be fixed with another tt-muncher. Additionally this doesn't support _ instead of a value kind, again you'd probably need a tt-muncher.

1 Like

@jethrogb That would definitely be doable and would probably make the resulting macro a bit easier to follow.

@H2CO3 I'm only vaguely familiar with Any trait. How would you imagine it being used here. It definitely may make things more simple.

@Kestrer Thank you so much for this example that's for sure a huge help! I'll have to try implementing this later to see how it goes.

I was replying to this concern of yours:

Instead of trying to explicitly store a type tag and then a big blob of untyped whatever, you could just come up with proper Rust backing types for the elements of your language, convert them to dyn Any, and then convert the dyn Any back to concrete types when needed, like this:

fn frobnicate(obj: &dyn Any) {
    if let Some(value) = obj.downcast::<String>() {
        println!("Got a string: {}", value);
    }
    if let Some(value) = obj.downcast::<u32>() {
        println!("Got an integer: {}", value);
    }
}

fn main() {
    let s = String::from("hello world");
    let n: u32 = 42;
    frobnicate(&s):
    frobnicate(&n);
}

Of course, when you start to have a multitude of these types, this gets ugly and un-idiomatic quickly. The usual solution is to come up with your own trait, use it to abstract away any common functionality across types, and then pass dyn Trait around.

However, be aware that any of these methods will probably eventually incur dynamic allocation if you want to build dynamically-living data structures out of your objects (which you probably do). That itself can lead to higher memory use due to internal bookkeeping of the allocator than simply using enums, and especially with garbage collection (that you are apparently using), they can be more detrimental to performance than whatever speed gains you get from not using an enum. So I'd suggest you to actually benchmark two or three alternatives and see which is best instead of just going with one based on a priori assumptions.

1 Like