How to properly `entry().or_insert()` on a HashMap<String, _>, without instanciating a String if it is already in the HashMap

Hi everyone!

I am struggling a bit getting my head over the lifetime/references system.
Here is my take on the last exercise on common collections from the rust book.
https://doc.rust-lang.org/book/ch08-03-hash-maps.html#summary

I am using an OOP approach, and I am focusing on the add_employee method.

1st approach

use std::collections::HashMap;

pub struct Directory {
  departments: HashMap<String, Vec<String>>,
}

impl Directory {
  fn new() -> Directory {
    Directory {
      departments: HashMap::new(),
    }
  }

  fn add_employee(&mut self, employee_name: &str, department_name: &str) {
    let department = self
      .departments
      .entry(department_name.to_string())
      .or_insert(Vec::new());
    department.push(employee_name.to_string());
  }
}

It compiles.
What I don't like with it, is that we always create new string: department_name.to_string(), even if the entry already exist.

2nd approach

use std::collections::HashMap;

pub struct Directory {
  departments: HashMap<String, Vec<String>>,
}

impl Directory {
  fn new() -> Directory {
    Directory {
      departments: HashMap::new(),
    }
  }

  fn add_employee(&mut self, employee_name: &str, department_name: &str) {
    let department = self.get_department(department_name);
    department.push(employee_name.to_string());
  }

  fn get_department(&mut self, department_name: &str) -> &mut Vec<String> {
    if let Some(department) = self.departments.get_mut(department_name) {
      return department;
    }

    self.departments.insert(department_name.to_string(), Vec::new());
    match self.departments.get_mut(department_name) {
      Some(department) => department,
      None => panic!("Should never happen"),
    }
  }
}

It doesn't compile, I get the following errors:

error[E0499]: cannot borrow `self.departments` as mutable more than once at a time
  --> src/test2.rs:24:5
   |
19 |   fn get_department(&mut self, department_name: &str) -> &mut Vec<String> {
   |                     - let's call the lifetime of this reference `'1`
20 |     if let Some(department) = self.departments.get_mut(department_name) {
   |                               ---------------- first mutable borrow occurs here
21 |       return department;
   |              ---------- returning this value requires that `self.departments` is borrowed for `'1`
...
24 |     self.departments.insert(department_name.to_string(), Vec::new());
   |     ^^^^^^^^^^^^^^^^ second mutable borrow occurs here

error[E0499]: cannot borrow `self.departments` as mutable more than once at a time
  --> src/test2.rs:25:11
   |
19 |   fn get_department(&mut self, department_name: &str) -> &mut Vec<String> {
   |                     - let's call the lifetime of this reference `'1`
20 |     if let Some(department) = self.departments.get_mut(department_name) {
   |                               ---------------- first mutable borrow occurs here
21 |       return department;
   |              ---------- returning this value requires that `self.departments` is borrowed for `'1`
...
25 |     match self.departments.get_mut(department_name) {
   |           ^^^^^^^^^^^^^^^^ second mutable borrow occurs here

I don't understand why it doesn't compile, as after the lines:

    if let Some(department) = self.departments.get_mut(department_name) {
      return department;
    }

the mutable reference should be dropped, thus it should be able to get mutably borrowed again.

In this approach, we only build a String if we have to insert it in the HashMap. So it's better.
What I don't like is that we check twice if the key is in the HashMap, while we know that it is there, since we just added it.

3rd approach

use std::collections::HashMap;

pub struct Directory {
  departments: HashMap<String, Vec<String>>,
}

impl Directory {
  fn new() -> Directory {
    Directory {
      departments: HashMap::new(),
    }
  }

  fn add_employee(&mut self, employee_name: &str, department_name: &str) {
    let department = self.get_department(department_name);
    department.push(employee_name.to_string());
  }

  fn get_department(&mut self, department_name: &str) -> &mut Vec<String> {
    if let Some(department) = self.departments.get_mut(department_name) {
      return department;
    }

    let mut department = Vec::new();
    let department_ref = &mut department;
    let department_name = department_name.to_string();
    self.departments.insert(department_name, department);
    department_ref
  }
}

It doesn't compile, I get the following errors:

error[E0499]: cannot borrow `self.departments` as mutable more than once at a time
  --> src/test3.rs:27:5
   |
19 |   fn get_department(&mut self, department_name: &str) -> &mut Vec<String> {
   |                     - let's call the lifetime of this reference `'1`
20 |     if let Some(department) = self.departments.get_mut(department_name) {
   |                               ---------------- first mutable borrow occurs here
21 |       return department;
   |              ---------- returning this value requires that `self.departments` is borrowed for `'1`
...
27 |     self.departments.insert(department_name, department);
   |     ^^^^^^^^^^^^^^^^ second mutable borrow occurs here

error[E0505]: cannot move out of `department` because it is borrowed
  --> src/test3.rs:27:46
   |
19 |   fn get_department(&mut self, department_name: &str) -> &mut Vec<String> {
   |                     - let's call the lifetime of this reference `'1`
...
25 |     let department_ref = &mut department;
   |                          --------------- borrow of `department` occurs here
26 |     let department_name = department_name.to_string();
27 |     self.departments.insert(department_name, department);
   |                                              ^^^^^^^^^^ move out of `department` occurs here
28 |     department_ref
   |     -------------- returning this value requires that `department` is borrowed for `'1`

error[E0515]: cannot return value referencing local variable `department`
  --> src/test3.rs:28:5
   |
25 |     let department_ref = &mut department;
   |                          --------------- `department` is borrowed here
...
28 |     department_ref
   |     ^^^^^^^^^^^^^^ returns a value referencing data owned by the current function

As I am moving the department inside a structure that outlive the method, the mutable reference department_ref should still be valid. Is there a way to express that in Rust?

This is the kind of implementation I am looking for, as we only check once for the key inside the HashMap, and we only build the String for the key if needed.
I would like to know if it is actually possible to do.
Thank you in advance for the help you can give!

I haven't taken a close look at your code, but to answer the question in your title: The only way to do that is to combine a call to get with a call to insert.

You will need to use the raw_entry api once that stabilizes (or if you're fine with nightly). Right now you must use get and insert as @alice point's out. You can rewrite get_department to:

fn get_department(&mut self, department_name: &str) -> &mut Vec<String> {
    if !self.departments.contains_key(department_name) {
        self.departments.insert(department_name.to_string(), Vec::new());
    }

    self.departments.get_mut(department_name).unwrap()
}

In order for it to compile. Right now Rust's lifetime analysis is rather conservative when you are returning borrows, and this will be improved on in the future with polonious. You may also consider using a Cow as input, that way if you already have a String, you don't need to double hash.

fn get_department(&mut self, department_name: impl Into<Cow<str>>) -> &mut Vec<String> {
    match department_name.into() {
        Cow::Owned(department_name) => {
            self.departments.entry(department_name).or_insert_with(Vec::new)
        }
        Cow::Borrowed(department_name) => {
            if !self.departments.contains_key(department_name) {
                self.departments.insert(department_name.to_string(), Vec::new());
            }

            self.departments.get_mut(department_name).unwrap()
        }
    }
}

with the raw_entry api, you can do

fn get_department(&mut self, department_name: impl Into<Cow<str>>) -> &mut Vec<String> {
    let department_name = department_name.into();
    self.departments.raw_entry_mut()
        .from_key(department_name.borrow())
        .or_insert_with(|| (department_name.into_owned(), Vec::new()))
}

The last one is the cheapest way possible, and requires only 1 hash and maybe 1 allocation. It only allocates if you provide it a &str and the key is not already in the departments. Otherwise it won't allocate.

1 Like

Hi Krishna, thanks for your details answer!

It seems that with the impl Into<Cow<str>>, we need to specify lifetime. (I am using the stable version, in case it matters).

Without lifetime

error[E0106]: missing lifetime specifier
  --> src/test4.rs:15:82
   |
15 |   pub fn add_employee(&mut self, employee_name: &str, department_name: impl Into<Cow<str>>) {
   |                                                                                  ^^^^^^^^ expected lifetime parameter

error[E0106]: missing lifetime specifier
  --> src/test4.rs:20:59
   |
20 |   fn get_department(&mut self, department_name: impl Into<Cow<str>>) -> &mut Vec<String> {
   |                                                           ^^^^^^^^ expected lifetime parameter

With lifetime

Here is the version with added lifetimes:

  fn get_department<'a>(&mut self, department_name: impl Into<Cow<'a, str>>) -> &mut Vec<String> {
    // Code
  }

It find it a bit verbose. I tried to reduce it with an aliase.

With aliase

type CowStr<'a> = Cow<'a, str>;

  fn get_department<'a>(&mut self, department_name: impl Into<CowStr<'a>>) -> &mut Vec<String> {
    // Code
  }

Which is a bit better, but not much, as I cannot use an aliased trait.

With aliased trait

It seems that it is possible with an unstable feature though.

#![feature(trait_alias)]

trait CowStr<'a> = Into<Cow<'a, str>>;

  fn get_department<'a>(&mut self, department_name: impl CowStr<'a>) -> &mut Vec<String> {
    // Code
  }
1 Like

Yes, sorry about that. I forgot lifetime elision doesn't work there.

No worries, thank you for the Cow tip!

One thing about Into Cow<'_, str> that I forgot to mention, both Cow<'_, str>: From<&str> + From<STring>, so you can pass either a &str or a String directly to something that expects impl Into<Cow<'_, str>>!

Here is the final version, using the 2 nightly features:

#![feature(hash_raw_entry)]
#![feature(trait_alias)]

use std::borrow::Borrow;
use std::borrow::Cow;
use std::collections::HashMap;

trait CowStr<'a> = Into<Cow<'a, str>>;

pub struct Directory {
  departments: HashMap<String, Vec<String>>,
}

impl Directory {
  fn new() -> Directory {
    Directory {
      departments: HashMap::new(),
    }
  }

  pub fn add_employee<'a, 'b, T1: CowStr<'a>, T2: CowStr<'b>>(
    &mut self,
    employee_name: T1,
    department_name: T2,
  ) {
    let department = self.get_department(department_name);
    department.push(employee_name.into().into_owned());
  }

  fn get_department<'a, T: CowStr<'a>>(&mut self, department_name: T) -> &mut Vec<String> {
    let department_name = department_name.into();
    self
      .departments
      .raw_entry_mut()
      .from_key::<str>(department_name.borrow())
      .or_insert(department_name.into_owned(), Vec::new()).1
  }
}

pub fn main() {
  let mut directory = Directory::new();
  directory.add_employee("Sally", "Engineering".to_string());
  directory.add_employee("Amir".to_string(), "Sales");
}

The or_insert returns a tuple, so I extract the value with .1.
It seems that I needed to add a type annotation for from_key, otherwise I was getting this error:

error[E0283]: type annotations needed
  --> src/lib.rs:35:8
   |
35 |       .from_key(department_name.borrow())
   |        ^^^^^^^^ cannot infer type for struct `std::string::String`
36 |       .or_insert(department_name.into_owned(), Vec::new()).1
   |                  ---------------------------- this method call resolves to `<B as std::borrow::ToOwned>::Owned`
   |
   = note: cannot satisfy `std::string::String: std::borrow::Borrow<_>`

I felt I have been down the rabbit hole :smiley:

1 Like

You want RawEntryMut::or_insert_with, otherwise you will be unconditionally calling String::into_owned. Otherwise it looks good!

Yes, makes sense! Thanks!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.