Typesafe Builder design

After having to spend time using java and getting annoyed by either having to use unreadable constructors with many fields or using builders without type safety I was wondering if I could implement a type safe builder in Rust.
After a short while I came up with this, which I consider to be a relatively easy pattern.

The most prominent downside I see with this approach is, that it produces a quadratic amount of code in respect to the number of fields the struct has.
I think the pattern should be easy enough for llvm to optimize out (especially if I add #[inline(always)]to all functions) but I am worried that llvm might just decide that compilation takes to long and stop optimizing - or just taking a lot of time compiling (which is hurtful to anyones workflow).

So my question is: Is there a trick I could use to still have the type safety and rather simple "type interface" (FooBuilder<Unset, Set<_>>) that would asymptotically reduce the amount of code generated?

Edit: My gut feeling is that this is not possible. Reason being that for type safety I will need a "type flag" for each field, resulting in at least n monomorphized builder structs passed to llvm for optimization.

2 Likes

The pattern looks good :+1: I don't know what you mean by quadratic amount of code: you have a fixed new() and build(), and then a setter for each field, so it's 2 + n functions for the builder.

FWIW, these kind of multiple small pieces of functions that rely on type-level tricks are quite idiomatic in Rust, and there are many instances of such, so don't worry about it :wink:


The only "drawback" at the moment is that 2 + n functions is still a bit tedious to write by hand; that's where macros shine! So, in that regard, there are already several crates which feature macros that write builder patterns for you, each with its own set of tradeoffs:

This may thus be a good chance for you to learn about macros, in case you were not to know much about them at the moment :upside_down_face:

3 Likes

The quadratic amount of codes comes from basically each setter. Imagine a struct with 20 fields. Then each setter would have to pass 19 fields from the previous Builder into the new Builder and set the new field.
Thus we would have at least 20*20 tokens generated by rustc (not sure what the correct unit would be, but token should somehow fit the bill).

Oh, I am aware of macros and I was also going to turn it into one, but was just wondering of the overall design idea (for which I don't need the actual macro to get feedback on :slight_smile: )

1 Like

I think that LLVM will find this relatively easy to optimize, but if you want to help it out, I think it would be helpful if you make sure that every monomorphized instance of the builder struct has the same layout. For example, you could do this like this:

struct Unset<T> {
    value: Option<T>,
}
struct Set<T> {
    value: Option<T>,
}

impl<T> Unset<T> {
    fn new() -> Self {
        Unset { value: None }
    }
}
impl<T> Set<T> {
    fn new(t: T) -> Self {
        Set { value: Some(t) }
    }
    fn into_inner(self) -> T {
        match self.value {
            Some(value) => value,
            // You could in principle use std::hint::unreachable_unchecked()
            // here.
            None => unreachable!(),
        }
    }
}

full example

Or in your case when the fields are relatively simple, initialize them to a dummy value such as zero and store the type information in a PhantomData field.

1 Like

Indeed! I'd even go and use MaybeUninit<T> for Set (without unsafe for once!) and T for Unset, to avoid the Option's discriminant and branch

6 Likes

That's really clever! You define them like this:

struct Unset<T> {
    value: MaybeUninit<T>,
}
struct Set<T> {
    value: T,
}

impl<T> Unset<T> {
    fn new() -> Self {
        Unset { value: MaybeUninit::uninit() }
    }
}
impl<T> Set<T> {
    fn new(t: T) -> Self {
        Set { value: t }
    }
    fn into_inner(self) -> T {
        self.value
    }
}

And now they have the same layout, but you have no space for an unused Option discriminant, and you don't need unsafe to remove the branch in into_inner().

In this case the builder struct probably even has the same layout as the final type.

2 Likes

I also already thought about using Option but decided not to because of the seeming redundancy and also thought about using maybeuninit (but decided not to because I didn't want to use unsafe in macro generated code), but didn't think about combining them that way! Really clever!!

One thing I noticed (not necessarily a downside) with your idea:
With a parameterized Unset<T> the method Foo::builder already has to know the exact types that will be passed into the builder.
It seems to me that for that reason the approach limits the usage a bit.
For example a function returning an unfinished builder where a generic field is still missing would have to specifiy the type of the missing value, whereas when using a not parameterized Unset this could be decided solely by the caller.

Not saying that this is necessarily bad, but at least limiting. On the other hand this limitation makes total sense since this is a requirement to guarantee the consistent memory layout...

Have to say I'm pretty torn apart right now. It turns out the Version with a parameterized Unset is easier to implement and might be easier for the compiler to optimize on the other hand it limits usability a bit..

I mean, you can use one strategy for the fields that are known in advance and another for fields that are generic.

Once I get the basic procedural macro working for generics and lifetimes I might try to leverage attributes for this :slight_smile:

I've implemented a type-safe builder in https://github.com/jkelleyrtp/optargs with const-generics. It produces code that scales linearly with the number of fields.

Here's the code that generates the builder:
https://github.com/jkelleyrtp/optargs/blob/07939f4d1a701307d31b9078e25c72761325d4dd/optargs-macro/src/optfn.rs

If you browse the examples with that commit, you can see how it the const parameters move from false to true as required parameters are filled in by the builders. Only when all the const parameters are filled in can the "build" method be used.

As far as I can tell your builder macro also produces quadratic amounts of code (please correct me if I'm wrong!).
Suppose you have a struct with m required fields, then you would need m const generic arguments on each impl block, of which there are at least m. Hence you would have tokens required for the bounds.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.