Automatic boxing and wrapping of complex data stuctures

I was annoyed because I had to build some large literals in code, so I made a hack.

Most of my data is defined in data files but some are most convenient as literals. The data structure is like shown below. The main annoyance is repeating SubsetA over and over again but I also made it so that you don't need to call Box::new.

enum TopLevel {
    // ...

enum SubsetA {
    // ...

The first step was to write From implementations from the subsets to TopLevel and Box<TopLevel>. But with only that I would have had a lot of .into(), so I made a macro:

macro_rules! into_spam {
    () => {};

    ( $($id:ident)? ( $($tok:tt)* ) ) => {into_spam!(@tup $($id)* ($($tok)*))};
    (@tup $($id:ident)? ( $single:tt , $($tail:tt)* ) $($res:tt)*) => {
        into_spam!(@tup $($id)*($($tail)*) $($res)* into_spam!($single), )
    (@tup $($id:ident)? ( $($id2:ident)? ( $($tok:tt)* ) , $($tail:tt)* ) $($res:tt)*) => {
        into_spam!(@tup $($id)*($($tail)*) $($res)* into_spam!($($id2)* ($($tok)*)), )
    (@tup $($id:ident)? ($($tok:tt)+) $($res:tt)*) => {
        into_spam!(@tup $($id)*() $($res)* into_spam!($($tok)*))
    (@tup $($id:ident)? () $($res:tt)* ) => {$($id)* ($($res)*).into()};

    ( [ $($tok:tt)* ] ) => {into_spam!(@arr ($($tok)*) )};
    (@arr ( $single:tt , $($tail:tt)* ) $($res:tt)*) => {
        into_spam!(@arr ($($tail)*) $($res)* into_spam!($single), )
    (@arr ( $($id:ident)? ( $($tok:tt)* ) , $($tail:tt)* ) $($res:tt)*) => {
        into_spam!(@arr ($($tail)*) $($res)* into_spam!($($id)*($($tok)*)), )
    (@arr ( $($tok:tt)+ ) $($res:tt)*) => {
        into_spam!(@arr () $($res)* into_spam!($($tok)*))
    (@arr () $($res:tt)* ) => {vec![$($res)*]};

    ($e:expr) => {$e.into()};

What do you think of this solution? Would you use that macro in other projects? Is there a superior approach?

This macro is a bit complicated and hard to read without comments. I've used macros before to simplify writing verbose literals (it was mostly for tests). Without something a bit more concrete, I'm having trouble understanding how to say whether this is the best option in your case.

This is how I'd do it:

impl From<SubsetA> for TopLevel { ... }
impl From<SubsetB> for TopLevel { ... }

macro_rules! top_levels {
    [ $($expr:expr),* $(,)? ] => (
        vec![ $( TopLevel::from($expr) ),* ]


(you mention Boxing but there is no trace of it in your post)

@Nashenas88 My macro walks the whole tree, inserting .into() after everything. I think that has to be very verbose to be able to walk through many kinds of syntax. If extended to all expression syntax, it could be a solution for situation where you'd want implicit conversions.

It seems to increase code size, though. I think that is because some of the .into calls are not inlined.

My main concern with this approach is that I don't think that this how Rust is intended to be used. Normally you have to do a lot of conversions manually. Instead, you could wrap your whole program in something like this. IMO that could be a good thing because it would hide many perfectly safe conversions that just clutter code. You could even make your own Into trait that is only for conversions that don't have an impact on runtime behaviour.

I would personally either use .into() manually, or @Yandros's example to wrap the items at each level. It's slightly more verbose, but the intent ends up being clearer to someone new to the project. If your concern is performance or code size, then you could look at marking the from fns in @Yandros's examples with the #[inline] attributes and measure how that affects the results.

I tried this out on a relatively small literal. Compared to not using into, binary size is as follows:

into_spam without inline: +25 bytes
into_spam with inline: +59 bytes

Relevant code without into_spam:

                    Box::new(ContainsUnit(Constant(1), vec![Town, City].into())),

With into_spam

                    ContainsUnit(Constant(1), vec![Town, City]),

I would have expected that the size with inline is the same as when manually using into, but it isn't, so I did some additional experiments.

I found that Box::new(OuterEnum(x)) is larger than x.into(). I suspect that that is why inlining made the size worse.

For some reason adding .into() to TheLand(...) raises code size by 30 bytes. So at least to get the smallest possible code you should manually insert into only where necessary. It is easier to read and surprisingly results in smaller code than manually using Box::new and the outer enum.

Maybe I’m missing something, but I would generally expect #inline to increase code size: it tells the compiler that you’d rather copy the implementation everywhere it’s needed rather than jump to some central implementation. As far as I understand it, inlining can speed up code through one of two mechanisms:

  • Modern processors are much better at executing straight-line code than taking jumps, which has the potential to interfere with their piplining
  • The optimization stage of the compiler can look inside the inlined function, and make changes that don’t match the ABI of a function call.

Neither of these are particularly correlated with the amount of code generated.

1 Like

@2e71828 The binary got bigger when I added a lot of conversions. So I assumed that inlining them would make them would give me the same code as without the conversions. But it turns out that the conversions actually made the code smaller, except one identity conversion that made it a lot bigger.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.