Implementing an object model


#1

I’m trying to emulate an object tree (or graph in general case), with two types of nodes:

  • Leaf nodes could be converted to a pre-defined set of primitive nodes (String, Number, Boolean, etc).
  • Non-leaf nodes could be “traversed” down. Given the field name, the traversal operation returns an iterator of children (as a first approximation, an iterator with single element could be used to represent cardinality of 1).
  • Each node has some “type information” associated with it (could be type name, cardinality, etc).

Now the first plot twist here is that I want to have that object tree by combining pre-existing data structures (for example, it could be serde_json::Value, a JSON tree) with “metadata” obtained in some way (could be dynamically loaded – for example, it could be a JSON schema read from the disk).

The reason I need that tree is that I have code that is generic over object trees of different types – I don’t want to couple it too much with the business of tracking meta-information. Also, the way data would look in the object tree could depend on that external metadata (so the same “data tree” could be represented differently in the “object tree”).

One option I have is to create a trait that will provide functions to “traverse” that tree and parametrize my code by type T implementing that trait. This is how it used to work till this moment.

However, the second twist is that now I would like to make my code to use dynamic dispatch. One reason is that I have piece of the code that (might) need to work with multiple object trees at once (although I’m not 100% sure of this requirement, but let’s pretend it’s legit for the sake of this question).

So, essentially, I need a data-type that would be similar to the trait object pointer, but “fatter”, with two data references, one to data and another to metadata. Something along these lines:

/// `ObjectPtr` is like a fat pointer, but with two data references.
/// ('dti is a life-time of both data and type information). 
pub struct ObjectPtr<'dti> {
  /// Reference to the type information (metadata)
  type_info: &'dti (),
  /// Reference to the datum
  data: &'dti (),
  /// Some virtual table
  vtable: *const (),
}

Then, the API could look something along these lines:

impl<'dti> ObjectPtr<'dti> {
  /// "Composite" nodes can be "traversed" by field name.
  pub fn field(&'dti self, field: &str) -> Box<'dti + Iterator<Item = ObjectPtr<'dti>>> {
    unimplemented!()
  }

  /// "Leaf" nodes can be converted into primitive value (which is `u32` in this case).
  pub fn as_primitive(&self) -> Option<u32> {
    unimplemented!()
  }

  /// Retrieve type name of this element.
  pub fn type_name(&self) -> String {
    unimplemented!()
  }
}

Finally, I would like to implement those object trees in a safe manner (no unsafe code). So, the trait I would like to implement would look something along these lines:

/// SPI (Service Provider Interface) to implement to "plug" a new "object system".
/// ('dti is a life-time of both data and type information).
pub trait ObjectModel {
  type Data: ?Sized;
  type TypeInfo: ?Sized;

  fn field<'dti>(
    &'dti self,
    type_info: &'dti Self::TypeInfo,
    data: &'dti Self::Data,
    segment: &str,
  ) -> Box<'dti + Iterator<Item = (&'dti Self::Data, &'dti Self::TypeInfo)>>;

  fn type_name<'dti>(
    &'dti self,
    type_info: &'dti Self::TypeInfo,
    data: &'dti Self::Data,
  ) -> &'dti str;

  fn create<'dti>(&'dti self, data: &'dti Self::Data) -> ObjectPtr<'dti>;
}

Then, the simple implementation might be like:

/// Simple object model: each object is typed based on the JSON field name used to access it.
struct SimpleObjectModel {
  /// Mapping from object field name to type name.
  types: HashMap<&'static str, &'static str>,
}

/// SPI interface to the type system.
impl ObjectModel for SimpleObjectModel {
  type Data = Value;
  type TypeInfo = str;

  fn field<'dti>(
    &'dti self,
    _type_info: &'dti str,
    data: &'dti Value,
    segment: &str,
  ) -> Box<'dti + Iterator<Item = (&'dti Value, &'dti str)>> {
    let type_info = self.types[segment];
    match data.get(segment) {
      // Arrays are iterated
      Some(&Value::Array(ref vec)) => Box::new(vec.iter().map(move |d| (d, type_info))),
      // Single values produce an iterator returning single value
      Some(value) => Box::new(std::iter::once((value, type_info))),
      // 'null'/'no data' produce an empty iterator
      None => Box::new(std::iter::empty()),
    }
  }

  /// Type name is just `type_info` itself
  fn type_name<'dti>(&self, type_info: &'dti str, data: &Value) -> &'dti str {
    type_info
  }

  fn create(&self, data: &Value) -> ObjectPtr {
    unimplemented!()
  }
}

First thing is I’m not quite clear how to bridge those two (ObjectPtr and implementations of ObjectModel). My though is that I could create an object-safe trait ObjectModelInternal, implement it for each type implementing ObjectModel, then use it for implementing ObjectPtr. So, I would store references to both type_info and data inside ObjectPtr and grab vtable out of ObjectModelInternal trait object created for given type system (or maybe I would need three data references in ObjectPtr, data, type info and object model itself).

Now, for added complexity, here are some extra wishes:

  • Would be nice to get a mutable model. I actually have a need to support shared mutable references semantics in one place. Since things in Rust tend to move all the time (and I cannot really control all the underlying data structures), not sure this even could be solved in a nice way. I have some thoughts around either handles (stable pointers which never change once object is created) or pinned objects (I can allocate objects “inside” object model data structure in a way that they don’t move).
  • Would be nice to somehow abstract over type_info/data types (for example, for some object models, type_info could be a numeric value acting as a handle). However, given that ObjectPtr has to be a sized struct, not sure much could be done here (maybe, custom DSTs would help here one day…).
  • Would be nice to have a parallel hierarchy for the static dispatch. Should be possible to have Object trait with an API similar to ObjectPtr, but built for static dispatch and minimum allocations (ideally, no allocations for traversals). Also, I would need conversions between those two. For example, a system that starts with “dynamic” objects should be able to invoke system that can specialize for type. Probably, could be achieved by making ObjectPtr actually implement that Object trait, so ObjectPtr could be passed down where T: Object would be expected (that would not remove dynamic dispatch, though).
  • Should be friendly to data structures which have “built-in” typeinfo (very likely that simply using () as TypeInfo would achive that).
  • Shouldn’t require PhD in Rust to use/troubleshoot.

Looks like this is a tough problem to solve in Rust. I don’t really have the full control over the problem I’m solving and it calls (so it seems) for more “object-orientish” approach. Until this problem I was able to do things in more “Rust” way – I do understand that you shouldn’t necessarily bring your habits from other languages (although that still doesn’t mean I’m overlooking some simple “Rust-way” solution).

But, hey, maybe this is just a bad idea and I need to do it some other way. I do like the idea of having type information independent from the data structures, though, as it would allow me to evolve both semi-independently.

tl;dr

  • I need a way to “zip” pre-existing data structure with “object model” in a way that I can traverse the object tree as pairs of (data, typeinfo).
  • Should be possible to use dynamic dispatch & struct (opposed to generalizing over some type T implementing a trait).

Any comments, recommendations? Projects to look at? Is it a crazy thing to want?

P.S. I’ve just realized I’ve seen similar idea of data+typeinfo in juniper crate – I’ll check it later.