`Vec` vs `HashMap` vs `HashSet` for unique named items

I have several unique named items, simple as:

struct Item {
    name: String
    ...
}

Now, this items should be gathered in a named collection and I'd like to easily probe this collection for the existence of the item by name and eventually add and remove the items from this collection.

Another complication is that this collection will be serialized in a TOML file so non-owned types are out of the picture.

So I have three way of doing this: (1) a Vec<Item>, (2) an HashMap<String, Item> and (3) an HashSet<Item>

The problem with Vec is that I have to manually manage the duplicates and I have to scan the whole Vec to match the name when I have to retrieve the named Item.

The problem about uniqueness would be solved by the HashSet but again I could not use .get() using the name of the item and I would be forced to use .iter again and fall-back on the previous case.

HashMap is handy because I could retrieve the Item by directly using the name as key but that information would be duplicated in the HashMap and in the struct Item (and I cannot use a reference because of serde).

Anything I'm missing here or any suggestion?

Can you implement Eq and Hash in terms of the name? If you also implement Borrow. You should be able to use get with a string

5 Likes

If you don't want to take that approach on your main struct, you can do it with a wrapper.

3 Likes

Out of curiosity, why using a wrapper here? I see that approach used a lot but I'm missing some background on why.

because Eq is usually taken to mean structural equality, for example if you have two versions of the same item with different contents you may want to make them unequal, for e.g caching and modification checking.

Another approach would be to remove the name from the item, then use a hashmap that returns struct NamedItem(Name, Item).

If the name is immutable, it's fine to have two copies of it, unless your goal right now is to optimize storage (or this is hypothetical). Otherwise, the decision should be based on your requirements. For instance, do you need O(log n) lookup time? I like the above suggestion to return a pair of name and item.

Got it, that makes totally sense, thanks.

The problem with this is that I need the name inside the item because sometimes I use that for the struct methods so I would prefer to have that in the struct item itself.