Reduce the number of Hash searches


#1

I’m using a HashMap and I can’t seem to get around 2 hash lookups in the common (item exists) case. My issue seems to be that in the miss case I want to be able to fail, and this seems to cause issues with the entry() method…

    use std::collections::HashMap;
    use std::collections::hash_map::Entry;
    
    struct Value {
        data : String
    }
    
    fn expensive_gen(name : &str) -> Result<Value, ()> {
        Ok(Value { data : String::from(name) })
    }
    
    struct Cache  {
        data : HashMap<String, Value>
    }
    
    impl Cache {
        pub fn easy(&mut self, name : &str) -> Result<&str, ()> {
    
            if self.data.contains_key(name) == false {
                let value = try!(expensive_gen(name));
                self.data.insert(String::from(name), value);
            }
    
            let result = self.data.get(name).unwrap().data.as_str();
    
            Ok(result)
        }
    
        // Doesn't work because self.data is already borrowed.
        pub fn better(&mut self, name : &str) -> Result<&str, ()> {
    
            if let Some(value) = self.data.get(name) {
                return Ok(value.data.as_str());
            }
    
            let value = try!(expensive_gen(name));
    
            self.data.insert(String::from(name), value);
    
            let result = self.data.get(name).unwrap().data.as_str();
    
            Ok(result)
        }
    
        // Doesn't work, 'entry' does not live long enough
        pub fn best(&mut self, name : &str) -> Result<&str, ()> {
    
            match self.data.entry(String::from(name)) {
                Entry::Occupied(entry) => {
                    Ok(entry.get().data.as_str())
                },
                Entry::Vacant(entry) => {
                    let value  = try!(expensive_gen(name));
                    let result = entry.insert(value).data.as_str();
    
                    Ok(result)
                }
            }
        }
    }
    
    fn main() {
        let mut cache = Cache { data : HashMap::new() };
    
        cache.easy("easy").unwrap();
        cache.better("better").unwrap();
        cache.best("best").unwrap();
    
    }

This is basically the code. The expensive_gen() function may fail, in which case the chain needs to fail.

easy() works, but in the common case (item exists) there are 2 hash lookups, and 3 on a miss.

better() would get me one hash lookup on the common case, and 3 on miss, but even with the return self.data is still held by the borrower.

best() would give me only 1 hash lookup in all cases, but ‘entry’ doesn’t live long enough. I’ve tried the “or_insert_with” method on entry, but doing a try! inside a closure just returns from the closure not the whole function.

In the grand scheme of things, I’m nitpicking over a few CPU cycles but I’m curious if there is a way?


#2

Thinking about this, the 2 hash searches when the item exists will probably outperform the construction of the String when it’s not needed, so I’ll probably just continue with what I have.


#3

You could change the entry.get() to entry.into_mut() in fn best to convert it into a mutable that has the lifetime of the HashMap to get it to live long enough. Then you’ll end up with something like this that’ll only call the expensive function on missing values.

pub fn best(&mut self, name: &str) -> Result<&str, ()> {
    let entry = match self.data.entry(String::from(name)) {
        Entry::Occupied(entry) => entry.into_mut(),
        Entry::Vacant(entry) => entry.insert(try!(expensive_gen(name))),
    };

    Ok(entry.data.as_str())
}

#4

Ahh, okay, that’s makes sense, thanks.