How to detect generic parameter in field type inside procedural macro

raidwas · April 23, 2021, 2:25pm

So I'm currently trying to implement a procedural macro for generating type safe Builders for Structs. So far it works well for simple structs without generics.

For example the following definition:

#[derive(Builder)]
struct Foo<P> {
    field1: P,
    field2: u16,
}

among other code it produces the following (incorrect):

impl<T_field2> FooBuilder<Unset, T_field2> {
    fn field1(self, value: P) -> FooBuilder<Set<P>, T_field2> {
        FooBuilder {
            field1: Set::new(value),
            field2: self.field2,
        }
    }
}
impl<T_field1> FooBuilder<T_field1, Unset> {
    fn field2(self, value: u16) -> FooBuilder<T_field1, Set<u16>> {
        FooBuilder {
            field1: self.field1,
            field2: Set::new(value),
        }
    }
}

Correct would be:

impl<T_field2> FooBuilder<Unset, T_field2> {
    fn field1<P>(self, value: P) -> FooBuilder<Set<P>, T_field2> {
        FooBuilder {
            field1: Set::new(value),
            field2: self.field2,
        }
    }
}
impl<T_field1> FooBuilder<T_field1, Unset> {
    fn field2(self, value: u16) -> FooBuilder<T_field1, Set<u16>> {
        FooBuilder {
            field1: self.field1,
            field2: Set::new(value),
        }
    }
}

(notice the added  for fn field1)

My problem boils down to the following:
Given some Struct definition with generics and multiple fields, how can I decide for each field which generic parameters it requires (in the context of a procedural macro)?

rrbutani · April 25, 2021, 4:56am

This is a great question!

Unfortunately, I don't exactly have an answer but I do think I can help.

To generate builder methods for FooBuilder as you described it, we would indeed need to figure out which generic params are used for each field in the actual struct. The "obvious" way of figuring this out (searching each field type for generic parameters) becomes less straightforward when you start to consider tuples and arrays and trait objects and all the other kinds of types that you can have in Rust.

You could definitely write a syn Visitor and handle all the cases yourself but fortunately I think there's an easier way.

For:

#[derive(Builder)]
struct Foo<P> {
    field1: P,
    field2: u16,
}

You can generate roughly:

use std::marker::PhantomData;

#[must_use]
struct FooBuilder<P, F1, F2> {
    field1: F1,
    field2: F2,
    _g: PhantomData<(P,)>,
}

impl<P> FooBuilder<P, Unset, Unset> {
    fn new() -> Self {
        Self {
            field1: Unset,
            field2: Unset,
            _g: PhantomData,
        }    
    }
}

struct Unset;
struct Set<V>(V);

impl<P, /*F1,*/ F2> FooBuilder<P, Unset, F2> {
    fn field1(self, value: P) -> FooBuilder<P, Set<P>, F2> {
        let FooBuilder { field2, _g, .. } = self;

        FooBuilder {
            field1: Set(value),
            field2,
            _g,
        }
    }
}

impl<P, F1/*, F2*/> FooBuilder<P, F1, Unset> {
    fn field2(self, value: u16) -> FooBuilder<P, F1, Set<u16>> {
        let FooBuilder { field1, _g, .. } = self;

        FooBuilder {
            field1,
            field2: Set(value),
            _g,
        }
    }
}

impl<P> FooBuilder<P, Set<P>, Set<u16>> {
    fn build(self) -> Foo<P> {
        let FooBuilder {
            field1,
            field2,
            ..
        } = self;

        Foo { field1: field1.0, field2: field2.0 }
    }
}

(Playground)

The key difference here is that P (i.e. any type parameters on the struct you're generating a builder for) are type parameters on the builder struct as well. This lets you sidestep needing to figure out which type parameters to duplicate on the individual setter functions altogether.

All the type parameters are always "present" in every impl block allowing you to just copy the field's type (i.e. P) to the setter's arg list (i.e. value: P) and to the setter's return type (i.e. Set). As an added bonus, this way you don't even need to have any special handling for fields involving generic parameters; you'd do the exact same substitutions for a field like field2 that's a non-generic type. This makes for easy code generation.

The one downside to this approach is that this means that the compiler will complain at you about not being able to deduce P for incomplete method chains (i.e. for FooBuilder::new().field2(23)). In practice I don't think this is really a problem since you cannot construct an instance of Foo without calling field1() at which point the compiler will be able to infer P.

raidwas · April 25, 2021, 9:42am

First of all thank you for taking your time!

I actually already started writing a visitor but didn't have time to finish it yet so I'm not yet sure if it works. I didn't however know of the visitors syn provides, so that will obviously be a much much easier task now xD
So currently I think I need an Path visitor and have to check for each path if the first identifier of it equals the identifier of the generic parameter. This however is just a guess and I'm not sure if it will be correct in all cases. Might however be good enough...

For your second approach:
Albeit for a different reason the possibility of having the generic parameters on the Builder struct itself and its downside already came up in my other question and if possible I would like to prevent it

I do think however the visitor is the correct approach, so I will mark the question as answered

rrbutani · April 25, 2021, 10:50am

Oh whoops; missed your other thread.

For the visitor:

I think you probably want a TypePath visitor that you feed the ty field in the Fields on your ItemStruct.

I'm still a little spooked at all the different variants on syn::Type but ultimately all roads do seem to lead back to TypePath. Filtering out Paths that aren't single path element + no path arguments and then comparing Idents with the list of generic Idents does seem like it'd work just fine.

Edit: I just re-read what you wrote and I think this is exactly what you were describing . Just in case though: I think you probably want to have a visitor that just overrides Visit::visit_field that then shells out to another visitor that just overrides Visit::visit_type_path; that way you can associate the type paths that are generic params that you discover with a particular field.

A few other things, though:

a single field can reference multiple generic params so you'll want to collect a list of params per field
- as in, your input struct could look like this:
```
struct Foo<A, B> { f1: (A, B), f2: usize }
```
- for which you'd want to produce something like this:
```
impl<F2> FooBuilder<Unset, F2> {
 fn f1<A, B>(self, val: (A, B)) -> FooBuilder<Set<(A, B)>, F2> { ... }
}
```
if type parameters on your source struct have bounds you'll need to replicate these on your Builder's setters or at least on your Builder's build method (the thing that goes from FooBuilder to Foo:
- i.e. for:
```
struct Foo<A: Clone, B: Hash> { f1: A, f2: B }
```
- if you produced:
```
impl<A, B> FooBuilder<Set<A>, Set> {
 fn build(self) -> Foo<A, B> { ... }
}
```
- you'd get a type error since A and B in the latter code block above aren't necessarily Clone and Hash respectively
- to get this to work you'd need to do this at the minimum:
```
impl<A: Clone, B: Hash> FooBuilder<Set<A>, Set> {
 fn build(self) -> Foo<A, B> { ... }
}
```
- and you'd probably want to copy the bounds to the individual setter functions too (raising the type error only when you call build makes it less immediately apparent to users what the problematic field/builder call was, I think)
the above case isn't so bad since you can just literally copy the bounds to a couple of places but it actually gets worse (sorry)
- consider:
```
struct Foo<A: Clone, B: ?Sized + dyn FnOnce(A)> { a: A, b: Box }
```
 - here you have to copy A everywhere B is used since the bound for B references A; this is especially problematic because it means that you can't set b before setting a (since A isn't a type on FooBuilder yet)
- and also:
```
struct Foo<'s, A: Clone + 's, B: ?Sized + Fn(&'s dyn FnOnce(&'s Box<A>))> { a: &'s A, b: Box }
```
 - here there's now a lifetime parameter that has to be copied around where it's used
 - it's the same challenge as the previous example but with some indirection; you have to infer that B's bound involves 's and that 's is used in a therefore b can't be set until a is set
 - if you add multiple usages for parameters this gets worse; i.e.
```
struct Foo<'s, A: 's, B: Fn(&'s u8)> { a: &'s A, b: Box, c: &'s u8 }
```
 - here setting c or setting a should be enough to allow setting b but I don't think we can even represent this in the type system without specialization
- and finally, const generics
 - these can be used in other types (i.e. arrays like [u8; N]) or used nowhere else in the struct's fields' types at all; not even in a PhantomData equivalent because the compiler doesn't need to infer variance for them
 - this means for types like:
```
struct Foo<A, B, const N: usize> { a: A, b: B }
```
 - you'll have to know to copy const N: usize onto the build method or to carry it along on the Builder type
 - for types where the const params are used in other types/bounds, you'll need to infer that too

Apologies; that was definitely way longer than I intended. Anyways the point is that I think just carrying the type parameters on the Builder type lets you sidestep having to handle lots of different edge cases.

That said, It's totally understandable if you ultimately decide that you don't want to support all the different things described above; while I don't think they're particularly esoteric bits of Rust code it's probably valid to decide that it's unlikely someone will want an auto-generated builder for something with lifetime bounds in it.

One last thing though. I think this was what you were referring to re: limitations with putting the type parameters on the builder type.

I just wanted to note that it's totally possible to do what you were describing; i.e. pass around incomplete builders without having to concretely specify types for fields that haven't been set yet. For the example in the previous message:

fn half_set<P>() -> FooBuilder<P, Unset, Set<u16>> {
    FooBuilder::new().field2(23)
}

fn main() {
    let f: Foo<_> = half_set().field1("👋").build();
}

(Playground)

Not sure if that's good enough for your use case but I thought I'd mention it.

Regardless, I'd love to know what you ultimately choose to go with/how it turns out! I'm emotionally invested now .

raidwas · April 30, 2021, 9:57pm

Ah, finally had some time to spend on this project

That was basically what I meant, except I wouldn't use a Visitor for visit_field as I already know the fields of the struct and can directly call visit_type on the field types with a Visitor that overrides visit_type_path.

Yup, that code was already in place. I was actually only missing the implementation of the following stub:
fn is_required_generic_for_type(ty: &Type, generic: &Ident) -> bool
(where ty is the type of the field in the struct and generic is a identifier of a generic of that struct (e.g. P for struct Foo)).

I'm aware of this, but m implementation does not yet handle this.
To be honest I'm also not sure if I want to support it, since then I might also have to support where bounds on the struct which make the life much more difficult. And all that for a feature that is mostly unused as far as I can tell (bounds on struct).

These are indeed interesting cases I will have to look at.
As a minimum however I would like to support basic lifetimes (maybe even without supporting lifetime bounds, since I consider them unusual on structs).

Thats exactly what I was refering to
From a functionality centered standpoint I would totally agree with your example code, as it (probably) makes the derive much simpler.
However from a usability standpoint I would say the following would be simpler to read/understand:

fn half_set() -> FooBuilder<Unset, Set<u16>> {
    FooBuilder::new().field2(23)
}

fn main() {
    let f: Foo<_> = half_set().field1("👋").build();
}

Then look no further
So I just finished implementing the function that was missing and all tests just turned green
For the simple structs that I tested (generics yes, no lifetimes/bounds/etc) the following did the trick:


fn is_required_generic_for_type(ty: &Type, generic: &Ident) -> bool {
    struct PathVisitor<'g> {
        generic: &'g Ident,
        generic_is_required: bool,
    }
    impl<'g, 'ast> Visit<'ast> for PathVisitor<'g> {
        fn visit_type_path(&mut self, node: &'ast TypePath) {
            if node.qself.is_none() {
                if let Some(first_segment) = node.path.segments.first() {
                    if first_segment.ident == *self.generic {
                        self.generic_is_required = true;
                    }
                }
            }
            visit::visit_type_path(self, node);
        }
    }

    let mut path_visitor = PathVisitor {
        generic,
        generic_is_required: false,
    };

    path_visitor.visit_type(ty);

    return path_visitor.generic_is_required;
}

The check for node.qself.is_none() is especially important I think, since the first path segment of a TypePath with a qualified self is not in the scope of the struct itself, but already in the scope of the qualifier. If this wasn't clear just ask and I will try to explain better ^^'
Otherwise the code is surprisingly simple. The performance could probably be improved by early returning from the visitor if generic_required is true, but I just hope people don't have ungodly long types in their structs -.-

In the end I can now successfully execute the following tests:

#[cfg(test)]
mod test {
    use type_safe_builder::{Builder, GetBuilder, Set, Unset};
    #[derive(Builder, Debug)]
    struct StructSimple {
        field1: u8,
        field2: u16,
    }

    #[test]
    fn builder_simple() {
        let x = StructSimple::builder();
        let x = x.field1(8);
        let x = x.field2(16);
        dbg!(x.build());
    }

    #[derive(Builder, Debug)]
    struct StructWithGeneric<T> {
        field1: u8,
        field2: T,
    }

    #[test]
    fn builder_with_generic() {
        let x = StructWithGenericBuilder::new();
        let x = x.field1(8);
        let x = x.field2(32u32);
        dbg!(x.build());
    }

    #[test]
    fn builder_with_generic_type_interface() {
        let x = get_builder_with_unspecified_generic();
        let x = x.field2(true);
        dbg!(x.build());
        let x = get_builder_with_unspecified_generic();
        let x = x.field2("some text");
        dbg!(x.build());
    }
    fn get_builder_with_unspecified_generic() -> StructWithGenericBuilder<Set<u8>, Unset> {
        StructWithGenericBuilder::new().field1(5)
    }

    #[derive(Builder, Debug)]
    struct StructWithGenerics<T, U> {
        field0: u8,
        field1: T,
        field2: U,
    }
    #[test]
    fn builder_with_generics() {
        let x = StructWithGenericBuilder::new();
        let x = x.field1(8);
        let x = x.field2(32u32);
        dbg!(x.build());
    }

    #[test]
    fn builder_with_generics_type_interface() {
        let x = get_builder_with_unspecified_generics();
        let x = x.field1(-10);
        let x = x.field2(true);
        dbg!(x.build());
        let x = get_builder_with_unspecified_generics();
        let x = x.field1("look");
        let x = x.field2("some text");
        dbg!(x.build());
    }
    fn get_builder_with_unspecified_generics() -> StructWithGenericsBuilder<Set<u8>, Unset, Unset> {
        StructWithGenericsBuilder::new().field0(8)
    }
}

Edit: multiple generics in a single field also work:

   #[derive(Builder, Debug)]
    struct StructWithGenerics<T, U> {
        field0: u8,
        field1: T,
        field2: U,
        field3: (T, U),
    }

    #[test]
    fn builder_with_generics_type_interface() {
        let x = get_builder_with_unspecified_generics();
        let x = x.field1(-10);
        let x = x.field2(true);
        let x = x.field3((-5, false));
        dbg!(x.build());
        let x = get_builder_with_unspecified_generics();
        let x = x.field1("look");
        let x = x.field2("some text");
        let x = x.field3(("another", "str"));
        dbg!(x.build());
    }
    fn get_builder_with_unspecified_generics(
    ) -> StructWithGenericsBuilder<Set<u8>, Unset, Unset, Unset> {
        StructWithGenericsBuilder::new().field0(8)
    }

I just noticed something funny: since the Builder has no idea what the dependencies between the different fields generics are, the builder allows calling field1(u8).field2(u16).field3((false,true)) and only fails at the build step this is probably the biggest downside to not having to specify the generics upfront

If you want I can keep you updated, as I said I want to get at least basic reference support, so a few changes might still come

raidwas · May 2, 2021, 8:12pm

A similar approach as for generics can also be used for basic lifetimes. After already having written a visitor, the visitor for lifetimes was pretty simple:

fn is_required_lifetime_for_type(ty: &Type, lifetime: &Lifetime) -> bool {
    struct LifetimeVisitor<'l> {
        lifetime: &'l Lifetime,
        lifetime_is_required: bool,
    }
    impl<'l, 'ast> Visit<'ast> for LifetimeVisitor<'l> {
        fn visit_lifetime(&mut self, node: &'ast Lifetime) {
            if node.ident == self.lifetime.ident {
                self.lifetime_is_required = true;
            }
            visit::visit_lifetime(self, node);
        }
    }

    let mut lifetime_visitor = LifetimeVisitor {
        lifetime,
        lifetime_is_required: false,
    };

    lifetime_visitor.visit_type(ty);

    return lifetime_visitor.lifetime_is_required;
}

For a simple struct like:

    #[derive(Builder)]
    struct StructWithLifetime<'a, 'b: 'a> {
        field1: &'a u8,
        field2: &'a &'b u16,
    }

This currently produces:

    impl<T_field1> StructWithLifetimeBuilder<T_field1, ::type_safe_builder::Unset> {
        fn field2<'a, 'b>(
            self,
            value: &'a &'b u16,
        ) -> StructWithLifetimeBuilder<T_field1, ::type_safe_builder::Set<&'a &'b u16>> {
            StructWithLifetimeBuilder {
                field1: self.field1,
                field2: ::type_safe_builder::Set::new(value),
            }
        }
    }

As one can see the required lifetimes are added as generics to the function definition. What is missing however are the lifetime bounds, those are currently only checked when calling the build function.
While it would definitely be possible to add the lifetime bounds one would have to then include all lifetimes that include in the bounds as well as the ones in theirs and so on... this however is not very useful to a developer as most of those lifetimes will be useless without the fields they are used in directly.

rrbutani · June 9, 2021, 12:30pm

Hey, sorry for the really late reply.

This is super neat! Is this published as a crate somewhere?

system · September 7, 2021, 12:30pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Procedural macro on structs, handling generics	3	615	August 5, 2022
Derive macro access to field's generics help	5	855	January 12, 2023
Advanced type guessing help	10	189	February 29, 2024
Procedural macro dependent on generic type	5	190	February 7, 2024
Procedural macro for structs with/without lifetime annotations help	8	385	March 27, 2023

How to detect generic parameter in field type inside procedural macro

Related Topics