How to avoid redundant cloning on HashMap insertion

Please consider the following code:

let hm: HashMap<String, Vec<String>> = HashMap::new();
let key = String::from("key");
let value = String::from("value");

hm.entry(key)
  .and_modify(|vv| vv.push(value.clone())) // Redundant `clone`
  .or_insert(vec![value]);

How can I avoid redundant cloning of value?

The core problem is that you're trying to use the same String multiple times here. In and_modify and in or_insert. String cannot be freely shared, however, due to the borrow rules. There are two methods to solve this, depending on what this code snippet is supposed to do exactly:

Possibility 1:
Change hm so that the values are Vec<&str> instead of Vec<String>. &str is an immutable reference so it can be used multiple times, anywhere, contrary to String, which is an owned object. That means that value will also need to be a &str instead of String, so I changed let value = String::from("value"); to just let value = "value";
Removing the .clone() will make the code work: Now the &str is shared at multiple places which is allowed.

Possibility 2:
If you need the HashMap to only contain Strings, no &str anywhere, you will have to restructure that code a bit.
Without the clone, value would have to be moved two times: once into |vv| vv.push(value) because there value is moved into vv. And also once in vec![value]. Rust's borrow checker will not allow this code. However, we now that this code doesn't really need value twice; When the first line (.and_modify()) is executed, the second won't do anything and when the second line will be executed (when the entry is empty), the first one will not have been executed.
We need some way to guarantee Rust that only one of these two things will run: a match statement (an ìf let else statement would also work):

let mut hm: HashMap<String, Vec<String>> = HashMap::new();
let key = String::from("key");
let value = String::from("value");

let entry = hm.entry(key);
match entry {
	Entry::Occupied(_) => { entry.and_modify(|vv| vv.push(value)); },
	Entry::Vacant(_) => { entry.or_insert(vec![value]); }
}

It is clear that either the line with Entry::Occupied or the one with Entry::Vacant will run, but definitely not both.

4 Likes

only one of them is executed

Functional programming often does more than you expect, so your statement always does;

  • call entry function
  • construct closure (taking reference to value)
  • calls and_modify function, (maybe call the closure inside it.)
  • constructs the vector (using value. Compiler knows value is no longer borrowed at this time as closure no longer exists; was dropped in and_modify)
  • calls or_insert function (maybe just dropping the vector it has been given)

(Note the compiler can do some inlining optimization)
The only slight performance can sometime be gained with replacing or_insert with or_insert_with but this doesn't help with the clone.

I probably would not recommend this but some times can be used.
Possibility 3:
Make use of Option as a secondary container. (The _with and closure are required or you have a code execution path where it tries to call unwrap on None.)

    let mut value = Some(value);
    
    hm.entry(key)
      .and_modify(|vv| vv.push(value.take().unwrap()))
      .or_insert_with(|| vec![value.unwrap()]);

Possibility 4:
Instead of Option make use of the vector. (This includes heap allocation constructing vector that 2 or 3 does not always cause.)

    let mut v = vec![value];
    
    hm.entry(key)
      .and_modify(|vv| vv.push(v.pop().unwrap()))
      .or_insert(v);
2 Likes

and_modify and or_insert are for when you don't know what kind of entry you have. Each does a match internally and dispatches to either VacantEntry or OccupiedEntry as appropriate. Since you're doing the match externally, there's no reason not to just dispatch it yourself:

let entry = hm.entry(key);
match entry {
    Entry::Occupied(mut e) => {
        e.get_mut().push(value);
    }
    Entry::Vacant(e) => {
        e.insert(vec![value]);
    }
}

However, there is a more concise way to write this by doing .or_insert first with an empty Vec:

let entry = hm.entry(key);
entry.or_insert_with(Vec::new).push(value);

I used or_insert_with to avoid creating a Vec unnecessarily, but since Vec::new does not allocate, it's probably irrelevant to performance.

5 Likes

also or_default could be used:

entry(key).or_default().push(value);
1 Like

also or_default could be used:

entry(key).or_default().push(value);

I think that will be two insertion, one for the default followed by the actual "value"

Functional programming often does more than you expect, so your statement always does;

  • call entry function
  • construct closure (taking reference to value )
  • calls and_modify function, (maybe call the closure inside it.)
  • constructs the vector (using value . Compiler knows value is no longer borrowed at this time as closure no longer exists; was dropped in and_modify )
  • calls or_insert function (maybe just dropping the vector it has been given)

So does it imply the compiler sees the calling chains as in one scope, so the borrow rules reject the code?
Whereas in the manual match case, the compiler knows two arms can not be executed at the same time.

Those are different insertions:

  • .or_default() inserts an empty Vec into the HashMap
  • .push(value) inserts value into the Vec

Only the first one can be avoided; the second one always has to be done (note that vec![value] just creates a new empty Vec and inserts value into it). So this is as efficient as any other option (before compiler optimizations).

There are invisible instructions (maybe gets optimised out by compiler) in constructing closures, so the closure takes the variables. Inside a call to closure acts as a different scope, the same as arms of if/else but outside arguments are evaluated in the same scope.
Without the clone value would be moved the error is;

error[E0382]: use of moved value: `value`
 --> src/main.rs:9:23
  |
8 |       .and_modify(|vv| vv.push(value))
  |                   ----         ----- variable moved due to use in closure
  |                   |
  |                   value moved into closure here
9 |       .or_insert(vec![value]);
  |                       ^^^^^ value used here after move

@trentj has posted best solution. (Not first time I've overlooked such use of library.)
Using or_default() does essentially the same but is not as easy to read.

There are invisible instructions (maybe gets optimised out by compiler) in constructing closures, so the closure takes the variables. Inside a call to closure acts as a different scope, the same as arms of if/else but outside arguments are evaluated in the same scope.

I had look into the MIR (too bad it refuses to show MIR if code doesn't compile) for
playground code:

    hm.entry(key)
        .and_modify(|vv| vv.push(value.clone())) // Redundant `clone`
        .or_insert(vec![value]);

and playground code:

    let entry = hm.entry(key);
    match entry {
        Entry::Occupied(_) => {
            entry.and_modify(|vv| vv.push(value));
        }
        Entry::Vacant(_) => {
            entry.or_insert(vec![value]);
        }
    }

The location where the closure is live are the same.
The difference is that the control flow for the clone case is and_modify()->or_insert()

    bb7: {
        ...                  
        // bb8 is the or_insert() block            
        _5 = const <std::collections::hash_map::Entry<'a, K, V>>::and_modify(move _6, move _9) -> [return: bb8, unwind: bb6];
        ...

while for the match case is switch->and_modify() or switch->or_insert().

    bb7: {                              
        ...
        // bb8 is and_modify, bb9 will init the Vec and goes to bb11, the or_insert() block
        switchInt(move _7) -> [0isize: bb8, 1isize: bb9, otherwise: bb10];
        ...  
    }

Hence the latter will borrow check.