Cartesian product using macros


#1

Hi all

I am trying to create a cartesian product generator using macro.
While generating pairs is basically solved, I cannot grasp how to accumulate resulting pairs like following:
[ [1,1], [1,2], [1,3], [2,1] … ]

#![feature(trace_macros)]
trace_macros!(true);

macro_rules! iter {
    ( $submacro:tt, $each:tt, $arg:tt, [] ) => {
    };

    ( $submacro:tt, $each:tt, $arg:tt, [ $head:tt $(, $tail:tt )* ] ) => {
        $each!($submacro, $head, $arg);
        iter!($submacro, $each, $arg, [ $($tail),* ]);
    };
}

macro_rules! cartesian {
    ( $submacro:tt, [ $head1:tt $(, $tail1:tt )* ], [ $head2:tt $(, $tail2:tt )* ]) => {
        iter!($submacro, cartesian, [ $head2, $($tail2),* ], [ $head1, $($tail1),* ] )
    };

    ( $submacro:tt, [ $head:tt $(, $tail:tt )* ] ) => {
        // multiply arr by arr
        cartesian( $submacro, [ $head, $($tail),* ], [ $head, $($tail),* ]);
    };

    ( $submacro:tt, $item1:tt, [ $head:tt $(, $tail:tt )* ] ) => {
        iter!($submacro, cartesian, $item1, [ $head, $($tail),* ])
    };

    ( $submacro:tt, $item1:tt, $item2:tt ) => {
        // getting pairs
        $submacro!($item1, $item2)
    };
}
macro_rules! test {
    ($e1:tt, $e2:tt) => {
       // getting pairs here
        println!("[{:?} {:?}]", $e1, $e2);
    }
}
cartesian!(test, [1,2,3,4], [1,2,3,4]);

However I cannot accumulate those pairs as list.
And getting list seems to be necessary for generating match arms (both condition and arm expression)

Any hints?


#2

Essentially, what you may want to do is to have an output array stored between recursive invocations of your macro. For example, in this macro, I used $out variable to store previous invocations of a macro. I wasn’t able to run your code, so I apologize for writing cartesian product macro myself.

macro_rules! cartesian_impl {
    ($out:tt [] $b:tt $init_b:tt) => {
        $out
    };
    ($out:tt [$a:expr, $($at:tt)*] [] $init_b:tt) => {
        cartesian_impl!($out [$($at)*] $init_b $init_b)
    };
    ([$($out:tt)*] [$a:expr, $($at:tt)*] [$b:expr, $($bt:tt)*] $init_b:tt) => {
        cartesian_impl!([$($out)* ($a, $b),] [$a, $($at)*] [$($bt)*] $init_b)
    };
}

macro_rules! cartesian {
    ([$($a:tt)*], [$($b:tt)*]) => {
        cartesian_impl!([] [$($a)*,] [$($b)*,] [$($b)*,])
    };
}

fn main() {
    println!("{:?}", cartesian!([1, 2, 3], [4, 5, 6]));
}

#3

Looks good. I do not entirely understand the magic though :slight_smile:
Thank you very much.


#4

Essentially, it’s convenient to think of Rust macros as a pure (no side-effects) functional language (even if really limited, as you cannot really do anything more complex than arranging grammar tokens the way you want to).

In this case, this programs uses recursion as means of simulating loops. It can be written like so in Haskell. This program is pretty much equivalent (++ is list concatenation operator).

cartesian_impl out [] _ _ = out
cartesian_impl out (a:at) [] init_b =
    cartesian_impl out at init_b init_b
cartesian_impl out (a:at) (b:bt) init_b =
    cartesian_impl (out ++ [(a, b)]) (a:at) bt init_b
 
cartesian a b = cartesian_impl [] a b b
 
main = print (cartesian [1, 2, 3] [4, 5, 6])

(in case you never used Haskell, (a:at) is a pattern that splits a list into first element (a), and rest (at).

So how does that work? In short, this simply heavily uses pattern matching in a loop. Because tail-recursion is used, it’s trivial to write this program in procedural way.

#![feature(slice_patterns)]

fn cartesian<T: Copy, U: Copy>(mut a: &[T], mut b: &[U]) -> Vec<(T, U)> {
    let mut out = Vec::new();
    let b_copy = b;
    loop {
        match (a, b) {
            (&[], _) => return out,
            (&[_, ref at..], &[]) => {
                a = at;
                b = b_copy;
            }
            (&[ae, ..], &[be, ref bt..]) => {
                out.push((ae, be));
                b = bt;
            }
        }
    }
}

fn main() {
    println!("{:?}", cartesian(&[1, 2, 3], &[4, 5, 6]));
}

Or less in a direct translation way.

fn cartesian<T: Copy, U: Copy>(a: &[T], b: &[U]) -> Vec<(T, U)> {
    let mut out = Vec::new();
    for &ae in a {
        for &be in b {
            out.push((ae, be));
        }
    }
    out
}

Note how previous versions stored the original version of b array, to be able to restore it at end of b iteration. This version doesn’t modify b variable to iterate it, so nothing needs to be copied.


#5

@xfix

Well, it’s much clearer now. Thank you.
However, I am kind of stuck again.

In your example
println!("{:?}", cartesian!([1, 2, 3], [4, 5, 6]));
… prints all pairs perfectly.
But trying to use any other macro besides println!/println!/format_args! refuses to “eval” cartesian before applying macro. Looks like format_args! is somewhat special unlike ordinary macros. I mean:

macro_rules! test {
    ([$(($a1:tt, $a2:tt)),*]) => {
        println!("{} {}", $a1, $a2);
    };
}

fn main() {
    //println!("{:?}", cartesian!([1, 2, 3], [4, 5, 6]));
    test!( cartesian!([1, 2, 3], [4, 5, 6]) )
}

error: no rules expected the token `cartesian`
  --> <anon>:27:12
   |
27 |     test!( cartesian!([1, 2, 3], [4, 5, 6]) )
   |     -------^^^^^^^^^------------------------- in this macro invocation

See full example https://is.gd/KLoUyG

Again some trick is necessary? :slight_smile:


#6

I’m not sure exactly what you want to do but this works:

macro_rules! test {
    ($a1:tt, $a2:expr) => {
        println!("{} {:?}", $a1, $a2);
    };
}

fn main() {
    let t = cartesian!([1, 2, 3], [4, 5, 6]); 
    //println!("{:?}", cartesian!([1, 2, 3], [4, 5, 6]));
    test!("test", t); 
}

#7

The reason why I can’t use something like you did is arguments to cartesian are not simple values, but type and enum elements. So it’s invalid typing from Rust perspective. That is something like following is expected:

macro_rules! test {
    ([$(($a1:tt, $a2:tt)),*]) => {
        ...
        match (x, y) {
          $(
             (&$a1, &$a2) => { ... }
           )*
        }
    };
}
test!( cartesian!([ASTBool, ASTInt, ASTFloat], [ASTBool, ASTInt, ASTFloat]) )

#8

And does it need to be be a macro in this case and a regular function won’t do?


#9

Yes. Unfortunately.
It’s going to be a simulation of dynamic multiple dispatch.
It’s NxM match arms. So it’s too many to write those manually.


#10

Alright. Sorry but my macro-fu isn’t strong enough to solve that :slight_smile:


#11

If you’re generating sequences of tokens to be given as input to another macro, then you’re in for some pain.

The basics

Macro invocations like a!(b!()) are expanded from the outside only; first a! is expanded (receiving the literal tokens b ! ()), and then once that is finished, if any macro invocations are left in the output, those get expanded.

In other words, println!("{}", b!()) works precisely because it does not attempt to “eval” b!().

Getting around this limitation is… fun.

Uh… “fun”?

Here’s a place where I used a cartesian product in one of my own macros. It gradually turns something like this…

// (note: in the source linked above this is written as
//  newtype_ops__!{@product::next(...) -> ()}
//  simply to avoid macro namespace pollution)
product_next!{ ({a b c} {A B}) -> () }

…into this…

product_next!{ ({A B}) -> (a) }
product_next!{ ({A B}) -> (b) }
product_next!{ ({A B}) -> (c) }

…into this…

product_next!{ () -> (a A) }
product_next!{ () -> (a B) }
product_next!{ () -> (b A) }
product_next!{ () -> (b B) }
product_next!{ () -> (c A) }
product_next!{ () -> (c B) }

…and then into this…

interpret!{ (a A) }
interpret!{ (a B) }
interpret!{ (b A) }
interpret!{ (b B) }
interpret!{ (c A) }
interpret!{ (c B) }

which then continues on with the next step. Note that for your use case you probably want the expansion to be more like this: (thankfully this is equally possible)

product_next!{ ({a b c} {A B}) -> (()) }
product_next!{         ({A B}) -> ((a) (b) (c)) }
product_next!{              () -> ((a A) (a B) (b A) (b B) (c A) (c B)) }
interpret!{ (a A) (a B) (b A) (b B) (c A) (c B) }

The key thing to notice here is that we never let a macro “finish”. The product_next macro directly calls the interpret! macro. At all times, all of our partially built expressions are safely contained inside macro argument lists where they won’t be evaluated.

Of course, this technique doesn’t scale well. If you have n macros that need a cartesian product, you would need to implement the cartesian product n times! The alternative is to write one cartesian product macro which takes a callback; but I’d be cautious about making it too general, as it can be quite a grueling experience to debug.

One more thing

Comma-separated lists of token trees are just silly (individual token trees need no delimiters). You probably want $($T:ty),* if you’re parsing types.


#12

Well.

It’s really painful already. /Maybe m4 is not that bad after all :slight_smile:
And yes, I planned to use cartesian in several places.
And looks like “callback” approach is also a no-go for me as it disallows to write match arms (“one token only” rule).

Thanks for showing horrors :slight_smile:


#13

Nonsense, what gave you this impression? Pretty much anything is possible with callbacks (though it might not be enjoyable to code!)


#14

There is a section here https://danielkeep.github.io/tlborm/book/pat-callbacks.html about callbacks. I guess it’s something in that direction you mean?


#15

Heh, that’s what I linked! :slight_smile:

(note: that entire book is a good resource for anyone writing macros btw; 99% of it is still relavant despite being written circa Rust 1.2)

The example there only shows a simple callback of the form $callback!(/* output here */), but one can generalize; a very general callback-using macro might look something like

macro_rules! scary_macro {
    (
     ($($input:tt)*)
     -> ($($work_area:tt)*)
     then $callback:ident!($($args_before:tt)*)($($args_after:tt)*)
    ) => {
        $callback!{
            $($args_before)*
            /* output goes here */
            $($args_after)*
        }
    };
}

#16

@ExpHP
It looks like you are suggesting to pass callback to macro to generate something per “pair”, right?
That callback should generate match arms in my case, that something like https://is.gd/e9E5gS is necessary. But it does not work due to that “one token” rule.


#17

DOH! I missed that. Sorry :slight_smile:


#18

I think the real issue there is that a match arm is not a valid expression. (the output of a macro must be a type, an expression, or one or more “items” (fn, impl, struct…), because these are the things which include macro invocations in their grammar)

For that case one needs a callback that produces the full match expression, and which takes ([$(($a1:tt, $a2:tt)),*]) as input. For this, expansion could flow like it does in my last code block in this post, to produce callback!([(a,A), (b,A), ...])


Unless…

do the match arms need to have different output based on the type? Because then I guess that’s something you really can’t do, at least not without the help of an incremental TT muncher (which will very easily hit the macro recursion limit) (which I guess isn’t actually a problem in this case since for internal use you can just raise the limit).


#20

I came with following.
But it results in some mysterious errors:

macro_rules! cartesian_impl {
    ($out:tt [] $b:tt $init_b:tt $submacro:tt) => {
        $submacro!($out)
    };
    ($out:tt [$a:expr, $($at:tt)*] [] $init_b:tt $submacro:tt) => {
        cartesian_impl!($out [$($at)*] $init_b $init_b $submacro)
    };
    ([$($out:tt)*] [$a:expr, $($at:tt)*] [$b:expr, $($bt:tt)*] $init_b:tt $submacro:tt) => {
        cartesian_impl!([$($out)* ($a, $b),] [$a, $($at)*] [$($bt)*] $init_b $submacro)
    };
}

macro_rules! cartesian {
    ( $submacro:tt, [$($a:tt)*], [$($b:tt)*]) => {
        cartesian_impl!([] [$($a)*,] [$($b)*,] [$($b)*,] $submacro)
    };
}

macro_rules! test {
    ( $(($a1:tt, $a2:tt)),* ) => {
        fn test_f(x:i64, y:i64) {
            match (x, y) {
              $(
                ($a1, $a2) => { println!("{} {}", $a1, $a2) }
              )*
              _ => { println!("???") }
            }
        }
    };
}

cartesian!( test, [1,2,3], [4,5,6] );
fn main() {
    test_f(2,4);
}

error: macros that expand to items must either be surrounded with braces or followed by a semicolon
  --> <anon>:15:24
   |
15 |         cartesian_impl!([] [$($a)*,] [$($b)*,] [$($b)*,] $submacro)
   |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Any hints?


#21

Macros invoked like name!() or name![] are limited to producing expressions and types (as opposed to statements and items) since that’s what the call visually resembles. In contrast, a macro called like name!(); (with a semicolon) or like name!{} is allowed to produce anything.

This limitation in turn affects any helper macro that takes a callback, since expansion will place the helper macro in that location of your code first before it puts the callback there.

For this reason, you should always use braces when calling callbacks. (i.e. change all the cartesian_impl!(...) to cartesian_impl!{...}, and same with submacro!()).