Enum Variant Narrowing in MongoDB

:wave:

What is a decent way to convert from one structure to another in the following case:

Problem Statement

There is a MongoDB collection with far too many schemas. These schemas should be normalized into a single structure.

There is a mongodb::bson::Document which is a variant of the mongodb::bson::Bson enum. The idea is to convert from one Bson structure to another Bson structure.

Ideally, this could be done with something like:

struct User {
  id: Bson::ObjectId,
  email: Bson::String,
  favourite_colours: Bson::Array(Bson::String),
}

fn normalize(document: Document) -> User {
  let id = match document.get("id") {
    Bson::ObjectId(o) => o,
    Bson::String(s) => ObjectId::from(s),
    _ => todo!()
  };

  let email = match document.get("email") {
    ...
  };

  let favourite_colour = ...;

  let user = User {
    id,
    email,
    favourite_colours,
  };
  user
}

However, enums are types - not their variants. So, the actual impl might look more like:

struct User {
  id: Bson,
  email: Bson,
  favourite_colours: Bson,
}

fn normalize(document: Document) -> User {
  let id = match document.get("email") {
    Bson::ObjectId(o) => o,
    ...
  };

  ...
}

The issue with this, is there is no decent type checking that the document is normalizing into the type wanted.

Options I have considered:


Bson -> Rust -> Bson
impl TryFrom<User> for mongodb::bson::Document {
    type Error = NormalizeError;
    fn try_from(user: User) -> Result<mongodb::bson::Document, Self::Error> {
        let uoc = doc! {
            "email": user.email
        };
        Ok(uoc)
    }
}

impl TryFrom<mongodb::bson::Document> for User {
    type Error = NormalizeError;
    fn try_from(doc: mongodb::bson::Document) -> Result<User, Self::Error> {
        let email = serialize(&doc, "email", String::new(), email_handler)?;

        let user = User {
            email,
            ...
        };
        Ok(user)
    }
}

fn email_handler(bson: &Bson) -> Result<String, NormalizeError> {
    match bson {
        Bson::String(s) => Ok(s.to_owned()),
        _ => Ok(String::new()),
    }
}

fn serialize<T, F>(
    doc: &Document,
    field: &str,
    default: T,
    transformer: F,
) -> Result<T, NormalizeError>
where
    F: Fn(&Bson) -> Result<T, NormalizeError>,
{
    if let Some(bson) = doc.get(field) {
        transformer(bson)
    } else {
        Ok(default)
    }
}

Annoyances:

  • Seems silly to convert from the type wanted to the same type, through another type
  • Custom structs are needed, because there is no native way to represent Bson::Null and Bson::Undefined (Options can only represent one of them)

Custom Bson Wrappers

Note: This is not a fully-fledged idea; never tried implementing it.

struct Array(Bson);

impl Array {
  fn get_bson(&self) -> Bson {
    match self {
      Bson::Array(a) => a
      _ => panic!("Invalid type used")
    }
  }
}

Annoyances:

  • More code to look at

Should I write a custom deserializer for this into "Rust" types? Is that normal?

Relevant Info

  • mongodb - Rust
  • bson - Rust
  • There are well over 1000 schemas for this one collection - the types are deeply nested
  • Certain Bson variants are made available as structs (e.g. Document, ObjectId), but the others are not

My wishful/ideal API:

fn normalize(document: Document) -> Result<User, Error> {
  let user: User = document.try_into()?;
  // OR, some sort of deserializer
  // Note: This exists, but does not convert - just fails if schema does not match
  let user: User = bson::from_bson(document)?;
}

You actually want statically-typed domain objects. Not BSON documents or anything else. And that's right, because you are working in a statically and strongly-typed language.

Furthermore, you want your static types to be deserialized from BSON types that don't actually match the variant each type is expecting to be deserialized from. To make this happen, you'll want to write transparent newtype wrappers around these types that deserialize from BSON more leniently than permitted by their default Deserialize impl.

For the record, ObjectId already deserializes from strings, so you won't have to worry about that.

Don't do that at all, it's the antithesis of static typing. Silently assuming that a Bson, that can be any variant, is actually one specific variant and then panicking if it isn't. Yuck!

Obligatory reading.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.