Interface the BTreeMap and HashMap

I am implementing a tree data structure. My idea to do this is to have a struct Tree that have children:SomeMapType<Tree> field to keep track of its children nodes, where SomeMapType may be BTreeMap, HashMap or something else. So instead, I would like it be polymorphic on the map type I choose to use.

Code as follows:

trait Map {
    type Key;
    type Value;
    fn new() -> Self;
}

struct Tree<Symbol, MapType>
where
    MapType: Map<Key = Symbol, Value = Box<Self>>,
{
    children: MapType,
}

impl<Symbol, MapType> Tree<Symbol, MapType>
where
    MapType: Map<Key = Symbol, Value = Box<Self>>,
{
    fn new() -> Self {
        Self {
            children: Map::new(),
        }
    }
}
impl<Key,Value> Map for std::collections::BTreeMap<Key,Value> {
    type Key = Key;
    type Value = Value;
    fn new() -> Self {
        Self::new()
    }
}

So far, so good. However, when I am to use this data type, e.g., calling the new function, by instantiating with a specific map, say BTreeMap, like follows :

fn main() {
    let tree = Tree::<u8,std::collections::BTreeMap<u8,_>>::new();
}

it compiles to an error message :

error[E0271]: type mismatch resolving `<BTreeMap<u8, _> as Map>::Value == Box<Tree<u8, BTreeMap<u8, _>>>`
   --> src/zips.rs:162:16
    |
162 |     let tree = Tree::<u8,std::collections::BTreeMap<u8,_>>::new();
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type mismatch resolving `<BTreeMap<u8, _> as Map>::Value == Box<Tree<u8, BTreeMap<u8, _>>>`
    |
note: cyclic type of infinite size
   --> src/zips.rs:155:18
    |
155 |     type Value = Value;
    |                  ^^^^^
note: required by a bound in `Tree`
   --> src/zips.rs:136:32
    |
134 | struct Tree<Symbol, MapType>
    |        ---- required by a bound in this struct
135 | where
136 |     MapType: Map<Key = Symbol, Value = Box<Self>>,
    |                                ^^^^^^^^^^^^^^^^^ required by this bound in `Tree`

My guess is that there is some trouble for the rust compiler to reason at that location about recursive types. My question is then how to actually achieve putting an interface that masks the implementation (BTreeMap or HashMap) details in my case.

This is the real problem:

Try writing down what the type actually is, expanding any instances of Self or Value with the actual type. You will find that the type is infinitely deep:

BTreeMap<u8, Box<BTreeMap<u8, Box<BTreeMap<u8, Box<BTreeMap<u8, Box<...>>>>>>>>

If you want to do something like this, then wrap the map in a struct. This definition does not involve an infinitely large type:

struct MyMap {
    inner: BTreeMap<u8, Box<MyMap>>,
}

Thank you. To make sure I understand, you suggest that I impl Map for MyMap instead of for std::collections::BTreeMap. If this is the case, then it is not the same as my use case.

I would need something like follows :

struct MyMapButWithTree {
    inner: BTreeMap<u8,Box<Tree<?,?>>>,
}

The Map is not recursive in itself, but only when used in Tree as the trait bound to MapType. Otherwise, MyMap could itself serve as my tree, but this would defeat my purpose of trying to be agnostic about which map implementations I am using.

The way to be agnostic over the map implementation would look like this:

struct RecursiveBTreeMap {
    inner: BTreeMap<u8, Box<RecursiveBTreeMap>>,
}

struct RecursiveHashMap {
    inner: HashMap<u8, Box<RecursiveHashMap>>,
}

You can implement your Map trait for both structs.

1 Like

But these maps are only capable of storing u8 as at each tree node. If Tree has extra data, this does not work? In my original code, the Value of MapType is a Tree itself instead of MapType. Is there some way to do this?

Right now the type is

Tree<u8, BTreeMap<u8, Box<Tree<u8, BTreeMap<u8, Box<Tree<…, …>>>>>>>

and thus infinitely nested, you cannot ever write down the type completely, and this is why the compiler ultimately complains.

If the map wasn’t abstracted, one could define

struct Tree<Symbol> {
    children: BTreeMap<Symbol, Box<Tree<Symbol>>>>,
}

without a problem because there’s no infinitely deeply nested name involved anymore. The recursive structure itself is fine.

A way to address this is to restructure the generalization of the map type so that the Value type is no longer required to be mentioned. For example, you can require the map to support all value types generically. The Self type of the Map trait can then no longer be the actual map type in question – you would need to resort to using either a specific instantiation as a stand-in, e.g. BTreeMap<Value, ()> – or you could use some new unit-struct as a marker type like

struct BTreeMapMarker<Value>(PhantomData<fn() -> Value>);

Let’s go with the latter, so it’s easier to distinguish the roles:

use core::marker::PhantomData;

trait Map {
    type Key;
    type MapType<Value>;
    fn new<Value>() -> Self::MapType<Value>;
}

struct Tree<Symbol, MapTypeMarker>
where
    MapTypeMarker: Map<Key = Symbol>,
{
    children: MapTypeMarker::MapType<Box<Self>>,
}

impl<Symbol, MapTypeMarker> Tree<Symbol, MapTypeMarker>
where
    MapTypeMarker: Map<Key = Symbol>,
{
    fn new() -> Self {
        Self {
            children: MapTypeMarker::new(),
        }
    }
}

struct BTreeMapMarker<Value>(PhantomData<fn() -> Value>);

impl<Key> Map for BTreeMapMarker<Key> {
    type Key = Key;
    type MapType<Value> = std::collections::BTreeMap<Key, Value>;
    fn new<Value>() -> Self::MapType<Value> {
        Self::MapType::<Value>::new()
    }
}

fn main() {
    let tree = Tree::<u8,BTreeMapMarker<u8>>::new();
}

Doing the same with the Key would not work so well once you introduce further methods that might require different constraints like Hash+Eq or Ord on the key type.

I’m also noticing this approach is suboptimal, since now, the map is no longer the self-type for future additional methods such as insert which one would like to be as simple as insert(&mut self, key: Key, value: Value)

I think I can think of another approach that could work, too, without this limitation, by using two traits… I’ll write on that shortly.

3 Likes

With an additional trait, the refactor turns out quite straightforward, and we can keep your original Map trait completely unmodified. I did add an insert method for illustration / as a sanity check.

trait MapKind<Key, Value> {
    type MapType: Map<Key = Key, Value = Value>;
}

trait Map {
    type Key;
    type Value;
    fn new() -> Self;
    fn insert(&mut self, key: Self::Key, value: Self::Value);
}

struct Tree<Symbol, MK>
where
    MK: MapKind<Symbol, Box<Self>>,
{
    children: MK::MapType,
}

impl<Symbol, MK> Tree<Symbol, MK>
where
    MK: MapKind<Symbol, Box<Self>>,
{
    fn new() -> Self {
        Self {
            children: Map::new(),
        }
    }
}

impl<Key: Ord, Value> Map for std::collections::BTreeMap<Key, Value> {
    type Key = Key;
    type Value = Value;
    fn new() -> Self {
        Self::new()
    }
    fn insert(&mut self, key: Key, value: Value) {
        self.insert(key, value);
    }
}

struct BTreeMap_;

impl<Key: Ord, Value> MapKind<Key, Value> for BTreeMap_ {
    type MapType = std::collections::BTreeMap<Key, Value>;
}

fn main() {
    let tree = Tree::<u8, BTreeMap_>::new();
}

As an unrelated remark: you won’t need the Box<Self> for Tree, as the map will already provide the indirection necessary, so Self works fine, too (MK: MapKind<Symbol, Self>).

3 Likes

I am amazed by your solution:). Thank you. Very clear and on point. Almost exactly what I want, except maybe an extra marker type which would make the code a bit obscure, which is due to the limit of the current Rust.

So, my understanding of your solution is this:

The key is to avoid the infinitely nested name. The way to achieve this is to use a marker type which hides our std::collections::BTreeMap via implementing a trait that associates std::collections::BTreeMap to the marker type by GAT.

This seems like a general approach to deal with such situation. Beautiful, beautiful!

1 Like

As you see in the second solution, using GATs here is not strictly necessary. But it would be a way to disallow the map type to decide which value types it can or cannot support. E. g. for data structures containing multiple maps with different value types, it could thus simplify/shorten some trait bounds that otherwise list all used value types - additionally it could aid better encapsulating implementation details. E. g. with the second solution, a GAT could easily be introduced by working with something like

trait MapKind<Key> {
    type MapType<Value>: Map<Key = Key, Value = Value>;
}

Regarding the marker type, you may even further generalize and use it as a kind of factory type where the creation of maps requires some additional context or state. (Maybe this could be useful for certain custom hashers for hash maps?) The Tree::new method could receive a MapFactory argument; with a trait something like

trait MapFactory<Key> {
    type MapType<Value>: Map<Key = Key, Value = Value>;
    fn create<Value>(&mut self) -> Self::MapType<Value>;
}

the struct could keep around the factory if creating new maps is necessary at a later point. What used to be a marker type now is a stateless factory. (But also, the API got even more verbose.)

2 Likes