Am I doing &str copying into a long-lived map correctly?

Yes, when the map is instantiated the map will take ownership of the keys. If you provide your insert fn with borrowed values, that implies that the value will be converted to an owned string somehow, which involves one allocation per string. But that's a lot like your original example, so I assume that's acceptable.

1 Like

Yes. The cache will be 99% reads, 0.5% inserts and 0.5% deletes.

Thank you. I am considering composing a benchmark to shed light on potential runtime performance concerns -- just to have cold hard facts.

Here's what I predict they'll say:

Once the maps are instantiated, all else being equal, they should be equally fast to read from, insert in, and remove from.

1 Like

Appreciate the feedback.

Please don't take my curiosity about benchmarks the wrong way: I am simply looking to populate a grimoire of my own about what are the best ways to approach various scenarios in Rust. And since those are all personal projects, I can afford to waste some time on painfully obvious things. :slight_smile:

I'll mark one of the answers as the accepted solution when I make up my mind -- to increase visibility for future readers. Thank you!

1 Like

If you are willing to donate a little more of your time to elaborate, I'll be very curious to read the extra info.

In your original code,

    let mut map = HashMap::<&'static str, usize>::new();

is not the same as

    let mut map: HashMap::<&'static str, usize> = HashMap::new();

and what you ended up with was a HashMap::<&'a str, usize> where 'a was not 'static. This would have bitten you when you started trying to pass the map around or similar. For example this:

fn insert_somewhere_else(map: &mut HashMap::<&'static str, usize>, s: String) {
    insert_str(map, &s);
}

will fail to compile because there is no way the String can last long enough to support the &'static reference. (The String gets deallocated at the end of the function, so if you were allowed to store a reference to it in the HashMap, the reference would immediately dangle. Hence it's not allowed.)

The fact that the two versions are different is unintuitive, and surprised me. I didn't want to derail this thread, so I posted a new one to make sure I'm understanding the details myself.

2 Likes

Oh! That was a refactoring mistake of mine. I started off with &'static str everywhere but quickly deduced that I'm doing it wrong so changed it. Apparently I rushed it and made a mistake that could have been hard to decipher down the road.

Thanks for pointing it out.

Going to the other thread.

Just to address one last point:

Well, how could it? When you get an error like this, you should expand the definition of the trait and substitute the concrete types. In this manner it will be obvious to you why what you want isn't possible. From looks like this:

trait From<T> {
    fn from(value: T) -> Self;
}

Therefore, when you call Into::into(): String -> &str, which is equivalent with From::from(): String -> &str, the specialized (substituted) types would look like this (not actually valid syntax, I write it like this just for the sake of explanation):

trait From<String> {
    fn from(value: String) -> &str;
}

Now, how can you get a &str from a String? You want to borrow its internal buffer, which either &value (via an implicit coercion) or value.as_ref() will do. However, that would be wrong. Both Into and From take the value to be converted by value. This means that if the value is a String, it is moved into a local variable (argument) of the function, so returning a reference to it is not allowed. This is why it is impossible to use either From or Into for the String -> &str conversion.

The trait you are probably looking for is AsRef<str>, or maybe Borrow<str>, or Deref<Target=str>. These all have a signature of &Self -> &Output (i.e. &String -> &str when specialized), which makes it possible to convert a reference to another one without taking ownership, and thus, without trying to return a reference to a locally-owned variable.

5 Likes

Truthfully, I didn't think of that -- I was simply like "hey, let's check if using it like that would incur an implicit copy of the String's contents (namely the &str)".

Your explanation helped, especially this part which I wasn't aware of:

The main point is that &str is not the contents of String - it's a reference to them, which can't exist without a String itself (or without some static data, like embedded in binary).

Yep, sorry, bad wording. I meant "I was wondering if using Into will automatically just copy the referenced String slice".

I made a very barebones GitHub repo benchmarking inserting in &str-keyed HashMap and a String-keyed HashMap here: GitHub - dimitarvp/rust-strmap-vs-stringmap

Initial results on an iMac Pro with a Xeon-2150B CPU and ECC RAM:

insert_str              time:   [628.88 ns 632.26 ns 635.95 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

insert_string           time:   [993.15 ns 996.75 ns 1.0004 us]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

I am planning to add benchmarks for deletions and lookups as well.

Any and all input and criticism is welcome. (There is code duplication but I can't use static since the String version moves ownership.)

5 key-value pairs result in very small maps; it's likely that (re)allocation of the buffer of the hash map will dominate everything else. It would be nice to profile these benchmarks in order to be sure that this is not the case, and if it is, modify them so that this problem is fixed.

Furthermore, I don't understand why you are mixing the types, i.e. 3 keys are str and 2 keys are String. Finally, b.iter(|| test_insert_str()) is unnecessarily complicated, you could just write b.iter(test_insert_str).

I forgot. :smiley:

I should rework that a bit to reflect my targeted real usage in a library later on.

What amount of keys inserted would you recommend for a more objective measurement?

I think a couple thousand would be a safe bet. Also, depending on what you want to measure, you could pull the allocation of the hashmap out of the function being benchmarked, and .reserve() the sufficient capacity upfront, causing the map not to reallocate.

1 Like

My real usage of this cache won't likely ever go beyond 100 entries so I think I'll go with that. I didn't mention more details but it's going to be an extremely hot and well-warmed-up cache. But as mentioned earlier, it's going to be 99% reads (hence the current insert benchmarks are mostly irrelevant; still, I wanted them in out of curiosity).

With the rest I agree. I managed to completely forget about HashMap::with_capacity. Oops.

(EDIT: The keys will be normalized / shell-expanded file names btw. So the key lengths can be anywhere from 10 to 255 characters.)

Switching from new to with_capacity improved performance by 42.5% of the &str-keyed HashMap and by 21.6% of the String-keyed HashMap.

It seems that you've already figured it out that the HashMap<&str, T> can only stores references to the variables of same scope, and you can't even insert to the map within the loop. Is it the map you want?

Not exactly. I want to be able to copy the &str instances so they are fully owned by the map and not shared with anyone. But I am reconsidering my approach currently because preliminary read/lookup benchmarks don't seem to show much difference.

Well, the &str, or in general &T is a reference, and the reference doesn't own its content. String is the owned version of the dynamic string content.

1 Like