Partial key for a hashmap

Hey Guys, first post so be easy on me.
I have the following problem. i want to go over a hashmap and mutate the values based on a key, but also based on a "partial" key. My key looks something like

pub struct MyKey {
     id: Uuid,
     type: MyEnum,
     polygon_id: Uuid,
}

I want to be able to get mutable references in 2: ways by id (which is just a hashmap), but also just by getting all values by a "partial" key

pub struct PartialKey {
    id: Uuid,
    type: MyEnum
}

I thought maybe @alice gave me the solution in Multi keys for same struct. Is HashMap with Rc the best solution?
but it came out a lot more difficult because I need to return a vec when I index by partial key.

I'm open to any solutions, maybe this is totally the wrong way of looking at it. Also Efficiency matters.

Thanks guys!

Just to clarify, do you have a collection of elements and do you want to query the elements matching id: Uuid, or the element matching id: Uuid, r#type: MyEnum?

Could you use HashMap<Uuid, HashMap<MyEnum, T>> where T is your value type?

Isn't that what you'd use some sort of prefix tree for? Maybe storing your data in a HashMap<PartialKey, HashMap<Uuid, V>> or HashMap<PartialKey, Vec<(Uuid, V)>> is efficient enough in practice? Efficiency matters kind of depends on the regular workload you are expecting. I.e. will there be a lot of insertions and deletions or far more MyKey lookups compared to PartialKey lookups? Optimizing data structures for certain operations often means that you have to make trade-offs.

5 Likes

Yeah you are spot on, the problem is that under normal workloads we have a lot of mutable lookups based on both type of keys (more of MyKey than PartialKey) and less insertions/deletions. which is why i thought HashMap<PartialKey, Vec<(Uuid, V)>> does not fit the trade-offs. also usually i think the vec would be of length 1, but i need to make sure of that. I feel like HashMap<PartialKey, HashMap<Uuid, V>> will be good enough, but I wanted to make sure there isn't another option which is used in these situations. The other issue with the nested hashmap is that it requires a lot more branching when working with MyKey because both your first lookup can "fail" and then also the second lookup can also "fail". Where in theory if i can find a value for MyKey i will find atleast 1 value for PartialKey.

matching the entire/partial key so id: Uuid, r#type: MyEnum, polygon_id: Uuid

As part of the solution, you can consider hoisting the r#type: MyEnum out by creating a collection/lookup for every variant of MyEnum.

use std::collections::HashMap;

struct MyValue;

enum MyEnum { A, B }

#[derive(Hash, Eq, PartialEq)]
struct Uuid([u8; 16]);

struct Data {
    a: HashMap<Uuid, MyValue>,
    b: HashMap<Uuid, MyValue>,
}

impl Data {
    fn get(&self, id: Uuid, r#type: MyEnum) -> Option<&MyValue> {
        match r#type {
            MyEnum::A => self.a.get(&id),
            MyEnum::B => self.b.get(&id),
        }
    }

    fn get_any(&self, id: Uuid) -> impl Iterator + '_ {
        [
            &self.a,
            &self.b,
        ].into_iter().flat_map(move |map| map.get(&id))
    }
}

There are probably crates that can help with this.

yes definitely but this is an extra optimization, i would still need to be able to use the partial key lookup

An alternative solution, if you can ensure everything implements Ord (Uuid does, assuming it's the one from the uuid crate), would be to use a BTreeMap instead. Then your partial key lookups would correspond to looking up a range using range() or range_mut(). Which solution is better may depend on exactly how you are using the data structure.

4 Likes

it does but what would the min,max for the range? Uuids are random so it would have to compare with each one

It looks like the upper and lower bounds for Uuid would be Uuid::nil() and Uuid::max(), since the ordering is just the default ordering for an array of u8s.

It would be doing a lookup over a tree, ordered by the ordering, so it just needs to find the right one in the order (which is O(log(N)) complexity). For the partial lookups it would be using a lexicographic ordering, where it would look up the things at the start of the key type first (which correspond to the partial keys), and then it can go through the Uuids at the end when it has found the right one (for a range lookup, this just corresponds to returning them one by one in an iterator).

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.