Am I doing &str copying into a long-lived map correctly?

Hello good and helpful people,

For a personal learning project I am in a need of a long-lived DashMap cache with string keys. I wondered: do I actually have to use the memory heap for the keys? Those keys will never change after all. So I settled for cloning a given &str or String because (a) the cache is going to live for the entirety of the program and (b) I can't find another way to reason with the borrow checker without using Box et.al.: Rust Playground

I also tried using Into but: the trait From<String> is not implemented for &str.

This is a personal learning project and I am trying to educate myself on what's the absolute most efficient way of having a long-lived map with immutable string keys, while using as little -- or no -- indirection (pointers) as possible.

Does my Rust Playground sample achieve this goal? Is the code idiomatic? Are there better ways?

Thank you.

1 Like

Simplified it a bit more.

When using strings in a collection like a HashMap, you'll almost always want to use String (i.e. an owned string) as the key type rather than &str (a borrowed string slice). Using the latter is an advanced usage due to the lifetime reasoning involved, and should not be done if you don't know what it means as it will needlessly complicate the code.

Instead, pass the String instances directly into the insert fn, don't borrow there.

As a side note, you cannot use .clone() to go from &str to String, use .to_string() instead.

1 Like

(Written pre-simplification, so you figured some of this out.)

First, let's look at insert_string:

fn insert_string<'a>(map: &mut HashMap<&'a str, usize>, key: &'a String) {
    insert_str(map, key.as_str());
}

Due to Deref coercion, you don't actually need this function to convert a &String to a &str. This works:

    // Changed `insert_string` to `insert_str`
    insert_str(&mut map, &key);

Next, let's look at insert_str:

fn insert_str<'a>(map: &mut HashMap<&'a str, usize>, key: &'a str) {
    map.insert(key.clone(), 0);
}

You're actually cloning the reference key, which is a &str, here. You're not creating a new String. References are Copy, too, so this is the same as:

fn insert_str<'a>(map: &mut HashMap<&'a str, usize>, key: &'a str) {
    map.insert(key, 0);
}

However, this still isn't doing what you want (if I understand what you want correctly). I think what you want is something like:

use std::collections::HashMap;

// Both `String` and `&str` implement `Into<String>`.
// The former will just return itself.
// The latter will allocate a new `String`.
fn insert<T: Into<String>>(map: &mut HashMap<String, usize>, key: T) {
    map.insert(key.into(), 0);
}

fn main() {
    let mut map = HashMap::<String, usize>::new();
    let key = String::from("3");
    insert(&mut map, "1");
    insert(&mut map, "2");
    insert(&mut map, key);

    println!("{:#?}", map);
}

(Playground)

There is also some oddities going on with lifetimes and types that deserves it's own reply.

3 Likes

Yep, the one thing that rubs me the wrong way in my both examples is the need for lifetime annotations. @quinedot's answer sheds light on this, as does your recommendation to just use String.

As for me using &str, think of it as a potentially misguided premature optimization. You are telling me that it's unnecessary, I take it.

Are there any runtime performance concerns linked to using String as a map key? I'll be using this cache a lot in my program.

Yes, when the map is instantiated the map will take ownership of the keys. If you provide your insert fn with borrowed values, that implies that the value will be converted to an owned string somehow, which involves one allocation per string. But that's a lot like your original example, so I assume that's acceptable.

1 Like

Yes. The cache will be 99% reads, 0.5% inserts and 0.5% deletes.

Thank you. I am considering composing a benchmark to shed light on potential runtime performance concerns -- just to have cold hard facts.

Here's what I predict they'll say:

Once the maps are instantiated, all else being equal, they should be equally fast to read from, insert in, and remove from.

1 Like

Appreciate the feedback.

Please don't take my curiosity about benchmarks the wrong way: I am simply looking to populate a grimoire of my own about what are the best ways to approach various scenarios in Rust. And since those are all personal projects, I can afford to waste some time on painfully obvious things. :slight_smile:

I'll mark one of the answers as the accepted solution when I make up my mind -- to increase visibility for future readers. Thank you!

1 Like

If you are willing to donate a little more of your time to elaborate, I'll be very curious to read the extra info.

In your original code,

    let mut map = HashMap::<&'static str, usize>::new();

is not the same as

    let mut map: HashMap::<&'static str, usize> = HashMap::new();

and what you ended up with was a HashMap::<&'a str, usize> where 'a was not 'static. This would have bitten you when you started trying to pass the map around or similar. For example this:

fn insert_somewhere_else(map: &mut HashMap::<&'static str, usize>, s: String) {
    insert_str(map, &s);
}

will fail to compile because there is no way the String can last long enough to support the &'static reference. (The String gets deallocated at the end of the function, so if you were allowed to store a reference to it in the HashMap, the reference would immediately dangle. Hence it's not allowed.)

The fact that the two versions are different is unintuitive, and surprised me. I didn't want to derail this thread, so I posted a new one to make sure I'm understanding the details myself.

2 Likes

Oh! That was a refactoring mistake of mine. I started off with &'static str everywhere but quickly deduced that I'm doing it wrong so changed it. Apparently I rushed it and made a mistake that could have been hard to decipher down the road.

Thanks for pointing it out.

Going to the other thread.

Just to address one last point:

Well, how could it? When you get an error like this, you should expand the definition of the trait and substitute the concrete types. In this manner it will be obvious to you why what you want isn't possible. From looks like this:

trait From<T> {
    fn from(value: T) -> Self;
}

Therefore, when you call Into::into(): String -> &str, which is equivalent with From::from(): String -> &str, the specialized (substituted) types would look like this (not actually valid syntax, I write it like this just for the sake of explanation):

trait From<String> {
    fn from(value: String) -> &str;
}

Now, how can you get a &str from a String? You want to borrow its internal buffer, which either &value (via an implicit coercion) or value.as_ref() will do. However, that would be wrong. Both Into and From take the value to be converted by value. This means that if the value is a String, it is moved into a local variable (argument) of the function, so returning a reference to it is not allowed. This is why it is impossible to use either From or Into for the String -> &str conversion.

The trait you are probably looking for is AsRef<str>, or maybe Borrow<str>, or Deref<Target=str>. These all have a signature of &Self -> &Output (i.e. &String -> &str when specialized), which makes it possible to convert a reference to another one without taking ownership, and thus, without trying to return a reference to a locally-owned variable.

5 Likes

Truthfully, I didn't think of that -- I was simply like "hey, let's check if using it like that would incur an implicit copy of the String's contents (namely the &str)".

Your explanation helped, especially this part which I wasn't aware of:

The main point is that &str is not the contents of String - it's a reference to them, which can't exist without a String itself (or without some static data, like embedded in binary).

Yep, sorry, bad wording. I meant "I was wondering if using Into will automatically just copy the referenced String slice".

I made a very barebones GitHub repo benchmarking inserting in &str-keyed HashMap and a String-keyed HashMap here: GitHub - dimitarvp/rust-strmap-vs-stringmap

Initial results on an iMac Pro with a Xeon-2150B CPU and ECC RAM:

insert_str              time:   [628.88 ns 632.26 ns 635.95 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

insert_string           time:   [993.15 ns 996.75 ns 1.0004 us]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

I am planning to add benchmarks for deletions and lookups as well.

Any and all input and criticism is welcome. (There is code duplication but I can't use static since the String version moves ownership.)

5 key-value pairs result in very small maps; it's likely that (re)allocation of the buffer of the hash map will dominate everything else. It would be nice to profile these benchmarks in order to be sure that this is not the case, and if it is, modify them so that this problem is fixed.

Furthermore, I don't understand why you are mixing the types, i.e. 3 keys are str and 2 keys are String. Finally, b.iter(|| test_insert_str()) is unnecessarily complicated, you could just write b.iter(test_insert_str).

I forgot. :smiley:

I should rework that a bit to reflect my targeted real usage in a library later on.

What amount of keys inserted would you recommend for a more objective measurement?