Share String between key and value of HashMap


#1

I have a struct with a String id and some other fields like

struct Record {
    id: String,
    count: u32
}

For quick lookup, I put records in a HashMap:

struct RecordAnalyzer {    
    records: HashMap
}

impl RecordAnalyzer {
    fn analyze(&mut self, id: &String) {
        if !self.records.contains_key(id) {
            self.records.insert(id.clone(), Record{id: id.clone(), count: 0});
        }
        self.records.get_mut(id).unwrap().count += 1;
    }
}    

I have to clone the id for both key and value of HashMap. Another way is to use a HashSet, but HashSet didn’t allow to mutate the values in it. Can I clone only once or how to do this better? Thanks!


#2

I think you can do this if you really wanted to if your keys and values were Arc<&str> instead of a String. But I would probably only store a HashMap<String, u32> and have a nice user interface to construct a Record when I need it. If you go that route, you could do something like this:

struct RecordAnalyzer {    
    records: HashMap<String, u32>,
}

impl RecordAnalyzer {
    fn analyze(&mut self, id: &String) {
        let count = self.records.entry(id.clone()).or_insert(0);
        *count += 1;
    }
}

#3

Actually there are many other fields in Record, so I can’t just put a u32 in HashMap. Make key and id an Arc<&str> is a good idea. They can share a String. Thanks @cbreeden.


#4

Why Arc though? An Rc is better if you don’t need to share it across threads.


#5

You can also use a HashSet, a few wrapper types, and abuse Borrow to make this work. Basically, you can fool the HashSet into only seeing the id field and then use the Borrow trait to allow querying by id. I could have sworn I’ve seen a crate that does this but I can’t find it so I’ve implemented a bare-bones version below. If you do happen to find such a crate, tell me. Otherwise, I’ll cleanup this implementation and publish it.

Usage

Given:

pub trait Id {
    type Id: ?Sized;
    fn id(&self) -> &Self::Id;
}

Usage:

#[derive(Debug)]
struct Record {
    name: String,
    value: u32,
}

impl Id for Record {
    type Id = str;
    fn id(&self) -> &str {
        &self.name
    }
}

fn main() {
    let mut map = KeyedHashMap::new();
    map.insert(Record {
        name: String::from("thing"),
        value: 42,
    });
    println!("{:?}", map.get("thing"));
}

Implementation

use std::collections::HashSet;
use std::cmp::{Eq, PartialEq, Ord, PartialOrd, Ordering};
use std::hash::{Hasher, Hash};
use std::borrow::Borrow;

// This is the wrapper that only exposes the ID to the HashSet
struct IdAdapter<T>(T);

// We need this adapter to work around coherence issues. We can't implement the blanket borrow
// we need if we don't wrap the borrowed type in a local type.
#[derive(Eq, Hash, PartialEq, Ord, PartialOrd)]
struct BId<T: ?Sized>(T);
impl<T: ?Sized> BId<T> {
    fn wrap<'a>(value: &'a T) -> &'a BId<T> {
        // Here be dragons... this is technically not guaranteed by rust but it works... To be completely safe,
        // we'd need to wait for the `#[repr(transparent)` RFC. 
        unsafe {
            &*(value as *const T as *const BId<T>)
        }
    }
}
pub struct KeyedHashMap<T>(HashSet<IdAdapter<T>>);

impl<T: Id> KeyedHashMap<T>
    where T::Id: Eq + Hash
{
    pub fn new() -> Self {
        KeyedHashMap(HashSet::new())
    }
    pub fn get<Q: ?Sized>(&self, id: &Q) -> Option<&T> where
        T::Id: Borrow<Q>,
        Q: Hash + Eq
    {
        self.0.get(BId::wrap(id.borrow())).map(|wrapped|&wrapped.0)
    }

    pub fn insert(&mut self, v: T) -> Option<T> {
        self.0.replace(IdAdapter(v)).map(|wrapped| wrapped.0)
    }
    pub fn remove<Q: ?Sized>(&mut self, id: &Q) -> Option<T> where
        T::Id: Borrow<Q>,
        Q: Hash + Eq {
        self.0.take(BId::wrap(id.borrow())).map(|wrapped|wrapped.0)
    }
}

pub trait Id {
    type Id: ?Sized;
    fn id(&self) -> &Self::Id;
}

impl<T: Id, B: ?Sized> Borrow<BId<B>> for IdAdapter<T>
    where T::Id: Borrow<B>,
{
    fn borrow(&self) -> &BId<B> {
        BId::wrap(self.0.id().borrow())
    }
}

impl<T: Id> Hash for IdAdapter<T>
    where T::Id: Hash
{
    fn hash<H>(&self, state: &mut H)
        where H: Hasher
    {
        self.0.id().hash(state)
    }
}


impl<T: Id> PartialEq<Self> for IdAdapter<T>
    where T::Id: PartialEq<T::Id>
{
    fn eq(&self, other: &Self) -> bool {
        self.0.id().eq(other.0.id())
    }

    fn ne(&self, other: &Self) -> bool {
        self.0.id().ne(other.0.id())
    }
}


impl<T: Id> Eq for IdAdapter<T>
    where T::Id: Eq + PartialEq<T::Id>
{}


impl<T: Id> PartialOrd<Self> for IdAdapter<T>
    where T::Id: PartialOrd<T::Id> + PartialEq<T::Id>
{
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        self.0.id().partial_cmp(other.0.id())
    }

    fn lt(&self, other: &Self) -> bool { self.0.id().lt(other.0.id()) }
    fn le(&self, other: &Self) -> bool { self.0.id().le(other.0.id()) }
    fn gt(&self, other: &Self) -> bool { self.0.id().gt(other.0.id()) }
    fn ge(&self, other: &Self) -> bool { self.0.id().ge(other.0.id()) }
}

impl<T: Id> Ord for IdAdapter<T>
    where T::Id: Ord + Eq
{
    fn cmp(&self, other: &Self) -> Ordering {
        self.0.id().cmp(other.0.id())
    }
}


#6

I had a similar question. Here is my thread.
My solution, in the end, was to use a Ordermap<String, ()> and to use indexes to keep track of the name of the object


#7

Note: one issue I didn’t consider is that there’s no way to get a mutable pointer to a value in a HashSet. One could use a RefCell (or even an UnsafeCell) to work around this but this issue is rather annoying…


#8

If you do use cells or other interior mutability, make sure it doesn’t modify Hash or Eq!