Sharing code between builder approaches

I have "builder" structs, whose job it is to prepare an actual "thing". For performance reasons, I often provide two implementations: a to_thing(&self) [less efficient] and a trait From<Builder> for Thing implementation, where the from(self) consumes the builder [more efficient]. A recent example, abridged:

#[derive(Debug)]
pub struct DatasetSchema {
    variables: Vec<Variable>,
    ordinals: HashMap<String, usize>,
}

impl DatasetSchema {
    #[inline]
    pub fn builder() -> DatasetSchemaBuilder {
        DatasetSchemaBuilder::new()
    }

    #[inline]
    pub fn ordinal(&self, name: &str) -> Option<usize> {
        self.ordinals.get(name).copied()
    }

    #[inline]
    pub fn get_variable_at(&self, ordinal: usize) -> Option<&Variable> {
        self.variables.get(ordinal)
    }

    #[inline]
    pub fn variables(&self) -> &[Variable] {
        &self.variables
    }
}

#[derive(Debug, Clone)]
pub struct DatasetSchemaBuilder {
    variables: Vec<Variable>,
}

impl DatasetSchemaBuilder {
    #[inline]
    fn new() -> Self {
        Self {
            variables: Vec::new(),
        }
    }

    #[inline]
    pub fn add_variable(&mut self, variable: Variable) {
        self.variables.push(variable);
    }

    #[inline]
    pub fn try_to_dataset_schema(&self) -> Result<DatasetSchema, Cow<str>> {
        let mut ordinals = HashMap::with_capacity(self.variables.len());
        for (ordinal, variable) in self.variables.iter().enumerate() {
            let name = variable
                .name()
                .ok_or("A variable did not specify a name.")?;
            if ordinals.insert(name.to_string(), ordinal).is_some() {
                return Err(Cow::Borrowed("Encountered a duplicate variable name."));
            }
        }
        Ok(DatasetSchema {
            variables: self.variables.clone(),
            ordinals,
        })
    }
}

impl TryFrom<DatasetSchemaBuilder> for DatasetSchema {
    type Error = Cow<'static, str>;

    fn try_from(value: DatasetSchemaBuilder) -> Result<Self, Self::Error> {
        let mut ordinals = HashMap::with_capacity(value.variables.len());
        for (ordinal, variable) in value.variables.iter().enumerate() {
            let name = variable
                .name()
                .ok_or("A variable did not specify a name.")?;
            if ordinals.insert(name.to_string(), ordinal).is_some() {
                return Err(Cow::Borrowed("Encountered a duplicate variable name."));
            }
        }
        Ok(DatasetSchema {
            variables: value.variables,
            ordinals,
        })
    }
}

In summary, there's some logic checking to making sure every variable has a unique name before returning the DatasetSchema.

What bugs me is that the logic for constructing the ordinals map (and guaranteeing unique names) is duplicated, more or less. I tried putting a method taking &self on the builder to build the map, but the TryFrom implementation complains I am calling a method when I'm "moving the value out".

One idea, since I am cloning variables anyway, would be to implement try_to_dataset_schema by doing something like this:

    #[inline]
    pub fn try_to_dataset_schema(&self) -> Result<DatasetSchema, Cow<str>> {
        self.clone().try_into()
    }

But before I go and make this change in 200 places, I thought I would ask if this is a good approach or if there's a more common pattern involving builders.

Also curious about my naming of things. Is try_to_blah a good name? Suggestions welcome.

Why do you have the first one at all? In my experience, the most common way to set up a builder is

impl FooBuilder {
    pub fn build(self) -> Foo {...}
}

As you note below, building multiple values from the builder can be performed by cloning the builder. You should only have a separately implemented function for building from &FooBuilder if there is a way to build more efficiently than the .clone().build() would.

Supposing that my above recommendation does not apply, what you could do to reduce code duplication is write a private function that constructs ordinals only, and call that from both building paths.

Unless there are specific reasons to do otherwise, builders should always have a method named build(). There's no need for a try_ prefix unless there are reasons why there are both fallible and infallible ways to build the value.

You can also impl TryFrom<FooBuilder> for Foo, but it’d be a quite rare situation where that’s a valuable implementation to have.

1 Like

I agree that (independently of the builder approach) the solution to the DRY concern is simply a private method:

fn ordinals(&self) -> Result<HashMap<String, usize>, Cow<'static, str>> { ... }

Incidentally, I changed the Cow to use 'static as the signature you originally had requires that self remain borrowed so long as the returned Result (or it's Err value) is still in use:

pub fn try_to_dataset_schema(&self) -> Result<DatasetSchema, Cow<str>> {

// Is short for:
pub fn try_to_dataset_schema(&self) -> Result<DatasetSchema, Cow<'_, str>> {
//                           ^-----------------------------------^^
// These have the same lifetime due to elision, i.e.
// using the `Result` requires that `self` remains borrowed

But your OP required no such borrowing. (On the other hand it's possible your actual code does borrow the names, or that you want to keep that door open for yourself. The 'static version cannot borrow from self.)

Also, as a matter of API design, you should consider returning a type that implements Error instead of a string. This

  • allows callers to process the error more specifically, matching the error you return without parsing your string,
  • allows callers to process the error more generically, passing it to generic error-handling logic, and
  • often but not always, allows you to avoid memory allocation involved in returning an error (because you can either return just an enum variant to specify the error, or move the relevant input data into the error value instead of formatting a new string on the spot), which is helpful for performance when the caller’s action might be to try something else rather than aborting the overall operation.