Is implementing a derive macro for converting nested structs to flat structs possible?

Hi, I'm trying to implement some sort of procedural macro that would automate the process of converting a nested struct definition into a brand new struct definition that recursively flattens fields marked with an attribute down to a flat definition. The reason is a bit convoluted (I'm fighting with something that doesn't support nested types and my theory is it would be easier to implement this macro than actually implement nested type support).

So, I imagine something like this:

#[derive(Flat)]
struct MyStruct {
    something: u64,
    string: String,
    #[flatten]
    nested_struct: NestedStruct
}
struct NestedStruct {
    other_something:u64,
    string_in_nested_struct: String
}

Would produce the following generated code:

struct FlatMyStruct {
    something: u64,
    string: String,
    other_something:u64,
    string_in_nested_struct: String
}

Or more concisely,

struct FlatMyStruct {
    something: u64,
    string: String,
    nested_struct_other_something:u64,
    nested_struct_string_in_nested_struct: String
}

However, I dont see how this would be possible with the default derive macro functionalities, as I cannot figure out a way to access the TokenStreams of the respective nested structs' definitions.

Any ideas?

1 Like

procedural macros are deliberately restricted to only seeing the things they annotate because they work on tokens, but it looks like you need something which can see everything.

What about using this custom derive as a simple "marker" for types that should be nested, then using a build.rs script that looks through every file in your crate before it gets compiled, collects all the struct definitions, and then generates new "inlined" structs for those that have the derive? The generated code could then be include!()'d, and that'll be the type you actually use.

3 Likes

I'd be interested in more details about your use case. Usually with structs, defining them is the easy part. The hard part is all the functions that use the struct. As long as that "something" that doesn't have "nested type support" really only is about defining the struct and involves nothing else, fine, then just generating the right struct definition might be the hard part, but I'm having a hard time imagining what that "something" could be. ... It would seem likely (to me) that you would want to use that generated struct somehow, and if that's done by manually accessing the relevant fields, I'd be wondering why a manual implementation of the flattened struct wouldn't be easier and more clear; on the other hand, if accessing the structs doesn't happen manually but in generated code, then that seems a way harder or at least significantly different problem compared to just generating the struct definitions.

I wrote a declarative macro that can more or less do what you requested. It does have the caveat that it also requires a macro to process the "inlined" struct ( NestedStruct in this case). It generates a new macro with the same name as the nested struct so that the derive macro can call that generated macro to get/smuggle information about the struct's fields.

Here is my solution

1 Like

Thanks for the answer! This seems like the best approach intended for things like these without a lot of hacking.

So, there are a couple of rust libraries around the Apache Parquet file format and Apache Arrow memory format(into which I want to put my data), which are columnar and use Dremel encoding with definition and repetition levels. The problem with dremel is that automating the logic behind calculating those definition and repetition levels for arbitrary structs is not at all trivial, and as such, those crates lack support for nested types as they're still very young. So all we're left with is some stable support for flat structs.

On a side note, it seems like all of the arrow/parquet writing in the world is already done through java.

Now the scary thing is that this code-gen-based solution would actually be more favourable than other options.

1 Like

Wow, I've never even guessed declarative macros can do something like this! Thanks a lot for the example!

If you need something less of a hack to work on multiple type definitions, consider wrapping all type definitions that depend on each other in one macro invocation.

flat! {
    struct MyStruct {
        something: u64,
        string: String,
        nested_struct: NestedStruct,
    }

    struct NestedStruct {
        other_something: u64,
        string_in_nested_struct: String,
    }
}