Let say I have a simple struct which contains only owned and/or copiable values:
#[derive(Debug)]
struct Person {
name: String,
id: u32,
// Other fields...
}
I want to index many of such elements using one of its (hashable) field, for instance with the name
field. For that purpose, I'm currently using a HashMap<String, Person>
, which works fine:
let mut index = HashMap::new();
let person = Person { name: "Alice".into(), id: 42 };
index.insert(person.name.clone(), person);
However, this design incurs an unnecessary allocation since the person name is a duplicate of the index key. I cannot use a HashMap<&str, String>
to avoid the allocation since it would not work with borrowing rules. I think I could make it work using Rc
and RecCell
but it's still suboptimal.
I tried another design which is this one:
use std::{
collections::{HashMap, hash_map::DefaultHasher},
hash::{Hash, Hasher},
mem::replace,
};
struct Index {
values: Vec<Person>,
index: HashMap<u64, usize>,
}
fn hash<T: Hash>(value: T) -> u64 {
let mut hasher = DefaultHasher::new();
value.hash(&mut hasher);
hasher.finish()
}
impl Index {
fn insert(&mut self, value: Person) -> Option<Person> {
let h = hash(&value.name);
if let Some(idx) = self.index.get(&h) {
let previous = replace(&mut self.values[*idx], value);
return Some(previous)
}
let idx = self.values.len();
self.values.push(value);
self.index.insert(h, idx);
None
}
fn get(&self, key: &str) -> Option<&Person> {
let h = hash(key);
if let Some(idx) = self.index.get(&h) {
return Some(&self.values[*idx])
}
None
}
}
impl Index {
fn new() -> Self {
Index { values: vec![], index: HashMap::new() }
}
}
This data structure internally stores a hash map mapping name
's hashes to their index in a list storing the person instances.
I identified several issues with this design:
- It requires two hashing, one for the
name
and another for the name's hash, - Removing items is not supported since it would invalidate the indices,
- I do not fully understand the guarantees of hashing so I do not know if this design has flaws.
Is this design valid and can it be improved? Do robust implementation which I could reuse exist?