Understanding the HashMap::entry() method

I've been trying to get a handle on how the HashMap::entry() method works. The documentation is here.

I find the way the usage example, copied below, is written to be confusing (mainly because I'm still a novice at using Rust).

use std::collections::HashMap;

let mut letters = HashMap::new();

for ch in "a short treatise on fungi".chars() {
    letters.entry(ch).and_modify(|counter| *counter += 1).or_insert(1);
}

assert_eq!(letters[&'s'], 2);
assert_eq!(letters[&'t'], 3);
assert_eq!(letters[&'u'], 1);
assert_eq!(letters.get(&'y'), None);

This line, in particular, is confusing to me:

for ch in "a short treatise on fungi".chars() {
    letters.entry(ch).and_modify(|counter| *counter += 1).or_insert(1);
  1. I'm not sure what function the pipe symbol | is serving.

  2. I could not find either the and_modify() method or the or_insert() method in the documentation to help me follow how to apply entry().

If anyone could explain this to me I would appreciate it. Thanks.

1 Like

It’s a clousure, so an anonymous function, between the pipe symbols are the arguments, following them is the return value. See this chapter in the Rust book.


Looking at the HashMap::entry method, you need to look at the type signature.

pub fn entry(&mut self, key: K) -> Entry<'_, K, V>

and see the return type called std::collections::hash_map::Entry. That’s where you can find the methods in question.

Connecting those to the previous point, the Entry::and_modify method is generic and takes an argument of some type F that implements the FnOnce trait, one of the 3 traits in Rust that closures, functions, and function pointers can implement (and actually the one that all closures will implement).

More precisely, the signature reads

pub fn and_modify(self, f: F) -> Self
where
    F: FnOnce(&mut V),

(on the Entry<'_, K, V> type, returned from calling entry on HashMap<K, V>) so the closure argument type is a mutable reference to the (type of) the value found in the HashMap at the requested position.

The HashMap’s value type in the code at hand seems to be somewhat cleverly inferred by the compiler based on the usage of or_insert, and defaulting rules, to be i32. The exact trait bound “FnOnce(&mut V)” then gives us the expected type signature of the closure, essentially it’s like an fn foo(…: &mut i32) { … }.[1] So the |counter| part of the closure introduces the closure argument “counter” of type &mut i32, and the code inside the closure is expected to return nothing. Once you know it’s a (mutable) reference, it also makes sense what the dereferencing operation means (and in particular why there’s dereferencing in the first place) in the closure body *counter += 1.


  1. With a return value, it would read “FnOnce(…) -> ReturnType”, in analogy to function signatures fn foo(…) -> ReturnType { … }. ↩︎

2 Likes

The pipe symbol is used to create a closure. It is short-hand for this:

fn add_one(counter: &mut usize) {
    *counter += 1;
}

letters.entry(ch).and_modify(add_one).or_insert(1);

This code does the following:

  • Look up ch in the map.
  • If the value already exists, add one to the current value.
  • If he value doesn't exist, then insert a one.

In the documentation for HashMap::entry, you can click the Entry<'_, K, V> return type to get to the page where and_modify and or_insert are documented. You can find links here:

Another approach to find them is to use the search bar on the top of the page on the standard library documentation. Searching for and_modify using that search bar should find it for you.

4 Likes

HashMap::entry(ch) gives you the hash map entry for key ch, which may be "vacant" (does not have a value yet) or "occupied" (already has a value). So, the following code:

...is pretty much just a shorthand for:

match letters.entry(ch) {
    // If value already exists, then increment it by one
    Entry::Occupied(occupied) => {
        let value = occupied.into_mut(); // <-- get mut reference to existing value
        *value += 1;
        value
    },
    // If value doesn't exist yet, then insert a new value of 1
    Entry::Vacant(vacant) => vacant.insert(1),
};
1 Like

To be a precise analogue, you would need to use occupied.into_mut() to avoid borrowing from the local variable occupied whose scope is limited to be within the match statement.

1 Like

Corrected :slight_smile:

Thanks for your comments. They are helping me get a handle on this. @steffahn, your comment helped me look up the and_modify() method and I have a question about its usage example, copied below:

use std::collections::HashMap;

let mut map: HashMap<&str, u32> = HashMap::new();

map.entry("poneyland")
   .and_modify(|e| { *e += 1 })
   .or_insert(42);
assert_eq!(map["poneyland"], 42);

map.entry("poneyland")
   .and_modify(|e| { *e += 1 })
   .or_insert(42);
assert_eq!(map["poneyland"], 43);

The hashmap isn't initialized anywhere, yet and_modify accesses the "poneyland" key. I'd say the or_insert creates the entry, except and_modify is called first. Yet, somehow "poneyland" gets inserted into the hashmap. How does that work?

Entry is pretty similar to Option, just with added conveniences to make the API better for its more specific use case.

and_modify will modify the value if there is one, and do nothing otherwise. It's similar to calling option.map(|e| { *e + 1 }) (though usually an Option contains an owned value, so it would probably look like option.map(|e| e + 1)).

or_insert will insert the value if it doesn't already exist there, a bit like Option::get_or_insert.

So with added comments

use std::collections::HashMap;

let mut map: HashMap<&str, u32> = HashMap::new();

map.entry("poneyland")
    // doesn't do anything because the entry is unoccupied
   .and_modify(|e| { *e += 1 })
    // inserts "poneyland" => 42
   .or_insert(42);
assert_eq!(map["poneyland"], 42);

map.entry("poneyland")
    // fetches the value associated with "poneyland" and increments it by one
   .and_modify(|e| { *e += 1 })
    // doesn't do anything because "poneyland" is already in the map
   .or_insert(42);
assert_eq!(map["poneyland"], 43);

Thank you. Your annotations really help clarify things for me. However, I still want to know how "poneyland" gets placed in the map. If and_modify() only modifies the value, how does the key -- "poneyland" -- get inserted?

Thank you. Your annotations really help clarify things for me. However, I still want to know how "poneyland" gets placed in the map. If and_modify() only modifies the value, how does the key -- "poneyland" -- get inserted?

Look here:

The entry(key) for a specific key can be "vacant" (no value yet), or "occupied" (does have a value).

and_modify() and or_insert() are just shorthands to deal with the two possible situations.

In this sense, the entry for every possible key always "exists" in the map, but simply the entries for most keys are "vacant" – the entry for a certain key is "vacant" until you insert a value into it.

Of course, all those "vacant" entries are not really stored in memory, which would be wasteful or even impossible. It's just how the interface of HashMap works; a certain way to think about the map :sunglasses:

(I think, technically, the Entry::Vacant for the absent key is created "on-the-fly" when you request it)

Here's a sketch of how it could be implemented. N.b. I may have gotten an API detail wrong here or there as I did it mostly by memory.

However, it's not really implemented that way -- part of the idea is that you don't have to rehash the key all the time and unwrap and so on. I don't know the exact mechanics off the top of my head -- implementation details -- but it's probably holding onto some pointers or similar under the hood so it can hand out references or fill new slots without recalculating the cache and iterating a bucket, etc. (I had to "cheat" a little and use map_try_insert to emulate this from the outside.)

But hopefully it's still useful in understanding the particular methods in question.

1 Like

So, I did some playing around in the playground, experimenting with these methods. Here's what I came up with:

use std::collections::HashMap;

fn main() {
    let mut feeder: HashMap<&str, u32> = HashMap::new();
    
    feeder.entry("george").or_insert(1);
    feeder.entry("betty").and_modify(|counter| *counter += 1).or_insert(1);
    feeder.entry("sally").and_modify(|counter| *counter += 1).or_insert(1);
    feeder.entry("george").and_modify(|counter| *counter += 1).or_insert(1);
    feeder.entry("betty").and_modify(|counter| *counter += 1).or_insert(1);
    feeder.entry("betty").and_modify(|counter| *counter += 1).or_insert(1);


    prnt_map(&feeder);
}

fn prnt_map(hog: &HashMap<&str, u32>) {
    for (key, val) in hog.iter() {
       println!("Key:   {}       Value:   {}", key, val);
    }
}

Here's what I came up with:

  1. If the entry passed in entry() doesn't exist, or_insert() inserts the new key along with the value you pass.

  2. and_modify() simply gives you access to the value associated with the passed key, allowing you to modify it however you want. It doesn't seem to have any effect on the key portion of the map entry in question.

Do I have it straight?

Here's the output of the above code:

Key:   betty       Value:   3
Key:   sally       Value:   1
Key:   george       Value:   2

Again, entry(key) gives you the Entry for the specified key. If a value was already assigned to that key, then the returned entry will be an "occupied" entry; otherwise it will be a "vacant" entry.

Now, and_modify() allows you to modify/update the value of an "occupied" entry; it does nothing for "vacant" entries. Conversely, or_insert() inserts a value into a "vacant" entry; it does nothing for "occupied" entries. Those functions are just shorthands to deal with the two possible kinds of entries.

Note: and_modify() passes through the given entry, so that you can "chain" it with or_insert() :sunglasses:

2 Likes
Aside:

Actually, the current implementation doesn't even not do a second hash lookup in or_insert; VacantEntry is currently just {hash: u64, key: K, table: &mut HashMap<K, V, S>}. IIRC the entry API may have avoided a double lookup with the old robin-hood hash table, but with the current simd-accellerated swiss-table hash map from hashbrown, IIUC the "double lookup" is even done on normal inserts anyway, because the insert position and the lookup position can actually differ due to how tombstones work. The entry API does still save a lookup from if you were to write if !contains() { insert() }, though, which does three, IIUC. The insert from VacantEntry also gets to use the path that knows it doesn't need to allocate.

But mostly, the entry API isn't about improving any sort of performance, it's about the "monadic" manipulation of the entry as an API convenience.

The raw_entry_mut even IIUC isn't really a performance benefit over entry with the hashbrown impl, IIUC, at least in terms of operations on the hashmap, despite originally being introduced as an optimization path over plain entry. It's for that reason it's been floated for removal a couple times, although I continue to argue that the full version of the raw_entry API provided by hashbrown but not std, providing total dependency injection for hash and compare, is still meaningfully worth it as API surface.

Anyway, that's a mini tour of what I know about entry, and probably more specifics than anyone really needs. And if someone beats the Swiss-Table hashmap, there's reasonable probably that the specifics will change again with the new implementation. (The (raw) entry API will always be at least no worse than the simpler API surface, though; at worst it could literally do the same thing as the simpler API.)

1 Like

Efficiency was a motivation, and avoiding rehashes the explanation I recall from whatever I learned the entry API from during edition 2015 (but I'm not going to do the leg work to find where that might have been :slight_smile:).

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.