Better ideas on a `HashMap` that both accepts `&'static str` and `String`?

Recently I've decided to write a network program that heavily utilizes string dictionary. From the experience of other languages, I expected it has the following two insert methods:

a.insert("key", "value");
let v2 = (1+2).to_string();
a.insert("key2", v2);

This is very natural in GC-based languages, as string variable and string literal has the same type. However in Rust we have &str and String. To be more specific, string literals, of type &'static str.
On the other hand, in C++, global/static const char* and std::string also can be different. Anyway the function definition can utilize overloading to achieve the same goal.

After some research, I think the best way to approach it is something like this:

trait HashMapHelper<'a> {
    fn ins(&mut self, k: &'a str, v: &'a str);
    fn ins2(&mut self, k: &'a str, v: String);
}

impl<'a> HashMapHelper<'a> for HashMap<Cow<'a,str>, Cow<'a,str>> {
    fn ins(&mut self, k: &'a str, v: &'a str) {
        self.insert(Cow::Borrowed(k),v.into());
    }
    fn ins2(&mut self, k: &'a str, v: String) {
        self.insert(k.into(),v.into());
    }
}

But it's obvious that one has to either use L1 or L2 to insert into the HashMap:

let d1:HashMap<String,String> = HashMap::new();
let d2:HashMap<Cow<str>,Cow<str>> = HashMap::new();

d1.insert("k2".into(),v2);   //L1
d2.ins2("k2",v2);            //L2

The former one requires to add .into() to each string literals (besides the cloning overhead), while the latter lacks the ability to stick to a single .ins() method.

Anyway this is usually just a small pain. But when encountered with a program that heavily uses dictionaries, the tiny problem becomes a little more obvious. So I'm posting here for any potential improvements on it. Any suggestion would be appreciated!

What about altering your helper trait to be generic?

use std::borrow::Cow;

trait HashMapHelper<'a> {
    fn ins<K, V>(&mut self, key: K, value: V)
    where
        K: Into<Cow<'a, str>>,
        V: Into<Cow<'a, str>>;
}

impl<'a> HashMapHelper<'a> for HashMap<Cow<'a, str>, Cow<'a, str>> {
  ...
}
3 Likes

I haven't thought about it. I will try to generalize it after making sure that it's the best practice. Thank you for the code snippet!

Summary

Patch for reposting to add reply quote.

Technically HashMap<Cow<'_, str>, Cow<'_, str>> is the right type for it, but if you're just trying to abstract away the difference between "different types of strings", that doesn't seem right IMHO.

Rust's &str vs String is not like char* vs std::string. In C++ both types could be thought of as alternatives and need to be mixed only for historical/compatibility reasons, so you'd abstract over that.

In Rust, &str exists for a good reason. It is specifically not storing any string data. It's more like a temporary state of a String object, or a method of temporarily accessing other string-like types.

Therefore, abstraction over these two is not adding universality of using any string type. It's an abstraction over ownership, which lets you either run strings' destructors immediately when the HashMap is destroyed, or defer to other object/scope to control that.

In Rust you mostly decide whether you need to own the strings, or whether borrowing is enough. And then just use one of them. If it's possible to use borrowed keys or values, then code like map.insert(&key, &value) is idiomatic in Rust. It's quite common to borrow objects when they're passed as arguments.

If you have a hashmap that owns the keys, then map.insert(key, value) or map.insert(key.into(), value.into()) is also idiomatic Rust. That extra call (which might also be .clone() or .to_owned()) highlights that a conversion is taking place, and isn't too bad syntactically.

99% of the time you have to own the data in a collection, otherwise it would anchor everything to the scope it's been created in, and be impossible to use outside of it (lots of borrow checker despair is caused by overuse of temporary references in collections). Collections with borrowed items are useful only in rare cases, e.g. to count number of unique items in a Vec without modifying the Vec you can create a HashSet of temporary borrows, get length, and throw the HashSet away.

1 Like
Summary

Oh why I lose the reply relation on every first reply? I confirmed that the reply tip on top of the edit box exists just before I click "Post".

To me std::string is in charge of freeing up the underlying memory, and thus behaves more like an owned variable.

But this is not the real point. I know that HashMap<String,String> with cloning on insert(&str) is somehow more standard behaviour, but what if I write a program that has many pre-defined key names, so that I'll use many literal strings (thus &'static str ), and meanwhile doesn't want to give up on flexibility to use struct fields rather than a HashMap ?

My thought is that Cow<str> is very close to perfect compared with overloaded .insert() method, except that one can never accept both &'a str and String in one single method.

I guess that would be useful for something niche like a HeaderMap type that stores the headers sent with every request.

Although, for something so specialised I'd probably just reach for a string pool/interning crate. That way you can reuse allocations for things which are duplicated at runtime and not just values known at compile time.

Agreed, except that one should also consider my limited skill not fulfilling to handle such complex systems. :smile:

For simple usage like HeaderMap , it's kind of enough to just "poolize" static strings, e.g. "Connection", "close", "keep-alive" for a HTTP header map. And rustc does poolize them when compiling, so it'll be great enough if one can write API like .insert(&str) and .insert(String) .

I actually disagree with that. &str is non mutable view to a 'static' i.e. exists in executable. Nothing temporary about it.

You're talking about the special case where the &str is an &'static str and points at a string literal hard-coded in the source code, but most &str are not like that. Most would point into the data owned by a String, and such a &str really is temporary.

The important point is that a &str is not owned. Something else owns the data, and the &str cannot outlive that something else.

1 Like

Maybe I shoud compare global const char* with &'static str, which would be more clear.

Ah seems it's really necessary... I'll edit my post.

In that case is more like a view "into" some part possibly whole, of a String.

Yes, exactly, it's a view into something else.

Ah, if you're taking advantage of the special case of Cow<'static, str> then it's not too bad. The static lifetime allows it to be used like a regular owned hashmap, so it's not as limited as collections that temporarily borrow items.

But in case you know all your keys, phf may be useful.

Thank you, now I think I might regress to something like HashMap<&'a str,String>. I think there would be only a few case that needs a dynamic key. And I'll definitely take a close look at phf.

However the value still remains mixture of dynamic and static. Fortunately .insert(&str, &str) and insert_move(&str, String) seems a little more comfortable now, rather than .ins1() .ins2() .ins3() etc. :smile:

Discourse only displays the little tooltip of "replying to user" if you're replying to a non-latest post. If the tooltip doesn't show, it's assumed you're replying to the most recent post (or the OP).

(I personally don't agree with this behavior, and would always display the tooltip, and it seems you agree. Abusing deleted posts to get the tooltip back though is an abuse of the system and shouldn't be done.

My usual workflow is that I (almost) always respond with a quote of the specific claim I'm responding to. This is extra necessary when replying to multiple users; it's considered better to put it all in one post rather than multiple just to get the little "replying to" icon.)

2 Likes

Regarding method naming, usually _owned is used instead of _move. Or the owned variant gets normal name, and the special &'static str case gets a _static method name.

You can also make it generic for both with

fn insert<K: Into<Cow<'static, str>>>(key: K) { let key = key.into(); … }

Seems exactly what I want! Never thought can utilize Into as that. I should have been dig more on implicit type casting. And I still strongly agree to not recommend using it more than my specific case, in which user (.insert() caller) still can recognize error easily if they just send in a param with improper lifetime, such as:

let a = String::from("value");
let mut b = MyCowStrHashMap::new();
b.insert("key", &a);
drop(a);
let c = b.get("key");
println!("{}", c);
error[E0505]: cannot move out of `a` because it is borrowed
  --> src\test.rs:20:10
   |
94 |     b.insert("key", &a);
   |                     -- borrow of `a` occurs here
95 |     drop(a);
   |          ^ move out of `a` occurs here
96 |     let c = b.get("key");
   |             - borrow later used here

This error should be clear enough for user to point out the key error here.

And again thanks to everyone helped me on this post!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.