&str vs String for HashMap key

New to Rust, coming from a Java and C++ background.

I'm reading the Rust book, and section 8.3 introduces the HashMap collection. The code example creates a map of team names to scores, defined as a HashMap<String,i32>.

From what I've learned about String and &str, it seems that a HashMap<&str,i32> would be more efficient for this example, since team names are hardcoded literal strings and won't be changing.

Trying to understand the advantage of using String, I Googled and found a StackOverflow answer which agreed that &str is more efficient, but noted that using &str

imposes significant restrictions on how the HashMap can be used/where it can be passed around

What exactly are the "significant restrictions" if &str is used?

I've read lots of information about the difference between String and &str, and think I mostly get it, but would appreciate some specific pros and cons for this particular example.

Remember that &str is not a type — it's a family of types. The lifetime component hasn't been specified. The restrictions they're talking about are that it has to be possible to write a suitable lifetime.

… since team names are hardcoded literal strings and won't be changing.

In that case, &'static str is an appropriate choice of lifetime, and you will not experience any particular restrictions. You would if you tried to make use of a map containing run-time constructed strings, which would have to use some lifetime shorter than 'static.

However, &'static str is not the most efficient option for a compile-time set of choices, because it still requires reading the entire string to confirm that items are equal or unequal. The best choice here is an enum.

#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
enum Team {
    Red,
    Blue,
}

impl Team {
    fn name(self) -> &'static str {
        match self {
            Team::Red => "red",
            Team::Blue => "blue",
        }
    }
}

(You can also use strum to generate stringification code like that for you.)

Using an enum also enables you to use more efficient structures than HashMap for some applications, because if you know there are only two (or however many) choices then [i32; 2] is much more compact and faster to access than HashMap<Team, i32>. But that's a further performance optimization, not something you should do automatically when you're just getting started.

11 Likes

&str isn't just some faster string, it's for restricting the scope where the values can be used. If you want a smaller non-growable string, use Box<str>. Box<str> and &str are essentially the same type, only with different ownership. They have identical representation in memory (pointer, length), but &str can be used only in the scope of the variable it borrows from, and Box<str> can be used anywhere.

In C++ terms, &str is a std::string_view, so it's not a string itself, it's a view into String/Box<str> or similar stringy type that has to already exist somewhere, before &str view of it can be created. And &str can be used only temporarily, only in the scope where that other String it borrows from exists, and cannot outlive it, and the data it borrows from cannot be mutated.

It's also impossible to create a new &str and return it from a function. You can create and return String, Box<str>, or Cow<str>, but returning a new &str is a use-after-free bug (that the borrow checker will prevent). In C++:

std::string data = "hello"; // let data = String::from("hello");
std::string_view reference = data; // let reference: &str = &data;
return reference; // reference does not live long enough error
11 Likes

I'm also new, coming from a java background. I've struggled a lot trying to solve the chapter 8 exercises only by reading chapter 8. Specially the one regarding the association between a person and department, that I'm still working on.

Without thinking about this Box thing (that is only explained at chapter 15, Smart Pointers - The Rust Programming Language, smart pointers, and I haven't read it), what I ended up doing was reading chapters 8 , 9 and 10 and 11 to understand generics and lifetimes. And to be able to perform test-driven development.

https://doc.rust-lang.org/book/ch08-00-common-collections.html
https://doc.rust-lang.org/book/ch09-00-error-handling.html
https://doc.rust-lang.org/book/ch10-00-generics.html
https://doc.rust-lang.org/book/ch11-00-testing.html

The journal file

In this repository, the git log shows messages of the struggles I did(do) face. So it may be possible to see the progress (or lack thereof) of each step.

I would recommend to read How not to learn Rust to both of you.

It was already confirmed but it should be stressed again: Rust is extra hard to learn for people with some experience, surprisingly enough.

People with lots of experience and knowledge of differently typed languages (like Haskell, OCaml or Scheme) usually pick it up easily. As do the people without any experience.

But people with limited experience often face the need not to learn something but to unlearn something… and that often leads to this incredible frustration and peculiar learning curve. In particular:

You would find that Rust quite unfriendly to these attempts and, most of the time, they are not needed.

The problem with test-driven-development in Rust lies in precisely what many consider it's most awesome advantage: incorrect code tend to just fail to compile! Of course that blows up the plan to use test-driven-development to smithereens: if you simply couldn't write code that doesn't work then how to do write test that is failing to then start fixing it?

Instead of test-driven development in Rust you usually have compiler-driven development: you design you data structures, write code and try to convince complier to accept it. Once that's done… that's it. Time to add some high-level integration tests and ship it.

That doesn't mean that unit-tests are entirely impossible or that mocks are never used, but these are “tools of last resort”, when everything else fails. And they either come from very unusual requirements (e.g. when you want to develop software for self-driving car and each test run would otherwise need actual trip with human supervisor… mocks are definitely preferable) or, more often, when your design is too strange for normal compiler checks to catch errors in it.

8 Likes

Yep. One of the things that bothers me worst is that , at least where I've got in the rust book is that references with the borrowing checker tend most often than not simply "do not work" . For example in the chapter 8_3 exercise, I've just tried to make a hashmap with &str as a value, which fails miserably due to the lifetimes problem.

The new version of my database.rs has &str replaced with String and "BAM!" all those pesky lifecycle specs vanish and everything "fits".

But it annoys me to big extent that in the end I have just to copy the objects everywhere ...

Instead of simply send references of them.

Are you sure this in the end is not a big memory waster and a performance chugger?

But I don't have complains with unit-testing. I like them. Even when they don't compile they provide a clue to why.

1 Like

Show the code that fails.

This works and doesn't copy the strings:

    use std::collections::HashMap;

    let mut scores = HashMap::new();

    scores.insert("Blue", 10);
    scores.insert("Yellow", 50);

    for (key, value) in &scores {
        println!("{key}: {value}");
    }
2 Likes

I'm really wondering what kind of code you write. For me, the errors I try to fix with tests are locigal errors. I don't test if the return type of a function is correct. Of course it is, I'm using a statically typed language. I test if the value that comes out makes sense.

3 Likes

I just try to use typestates where feasible.

Yeah, but with typestates if value can be returned then does make sense, thus there are nothing to check.

Indexes don't work, of course, one needs dependent typing to exclude these kinds of errors, but they are also, usually, not so complicated as to warrant unit-tests on their own.

Sure, but why would you need unit-tests for that? I verify my public API and if something can not be observed via public API… should I even care?

Sometimes it's not possible verify some tricky-yet-important property with just a public API, but most of the time the answer to the previous question “no, I really shouldn't”.

Ahhhh, that's really cool :slight_smile:

Because I often write numerical algorithms for which it is easy to write test cases but hard to spot errors in formulas. Also I often will want to optimize these algorithms later without changing their behavior. Tests are really usefull here. Especially for corner and edge cases.

Thanks for explaining your use cases. Your lack of need for unit tests now makes much more sense.

3 Likes

In most cases there is not a lot of copying, because once the String has the right owner, it can be borrowed as &str from that owner.

In a situation where excessive copying seems inevitable, you can use reference counted Arc<str> instead of String. Do consider running a benchmark to see what version is actually faster in your use case, though.

2 Likes

Yes, it works. While you are using static, defined at compile time, &str. So this works for a hashmap for storing maps between constants, for example. But at the moment you try to have these constants in a configuration file loaded on application startup this starts to give problems.

The code I'm writing right now is public. I'm working on the third exercise of chapter 8. So I'm creating a "database" containing the people and the department they belong to.

The module is in https://github.com/abmpicoli/my_rust_journal/blob/master/ch8_3_human_resources/src/database.rs .

I'm following the idea of Test Driven Development. Write a code that fails, write a test that checks if the code works, fix the code. Rinse and repeat.

1 Like

Thanks much for the pointer to "How Not to Learn Rust" -- just read it, was very helpful.

In short: baby steps and finish reading the book, in my case :slight_smile:

So, I'm keeping on making my solution using mostly features from the rust book, and the chapters I've mentioned in the previous post.

For the moment I will take the leap of faith that "Ok, my code is quite ugly right now, but it will get better once I learn more". :slight_smile:

If your configuration file is loaded once, at startup, you can String::leak to get a &'static str.

1 Like

You probably were taught by well-meaning teachers, but it's time to accept the truth: world have changed.

Decades ago, when languages like Lisp rules and Java was dreamed up every byte counted and references were cheap. Today… it's the opposite. If you look on 10 years old the table you'll see that you need approximately the same time to copy 1K of data as you need to traverse one, single, reference to it's source if reference happens to be not in cache but in main RAM!

And situation is not becoming better any time soon.

That's one of the reasons Rust is [slowly] taking over: it's design which leads to creation of many copies (temporary and non-temporary) produces, nonetheless, very efficient programs.

So don't worry too much about needless copies, yet cost much less than your teachers told you.

Of course at some point excessive copies would hurt, but don't fret about them when you are just learning Rust.

3 Likes

In this scenario you want a copy from the file to main memory (you could also memory-map the file to avoid that copy, but let's ignore that possibility for now).

So you read the string into a String and move that String into the HashMap<String, i32>.

There are no superfluous string copies in this scenario, there is just one copy in RAM.

2 Likes

Exactly. Not an &str. A String. So, as a gross rule of thumb, a Hashmap of &str, in keys or values, even static ones, is not in general a good choice, comparing to a String, except on very very specific cases.

In my specific case in the database.rs code it is not even on startup, it is really an arbitrary lifetime.

So, I may be sounding silly, because I'm a total noob in the language, but this is my "initial approach" to the model. Oooga booga! String works! str no good ! Oooga booga!

And yet, Rust is not being used only in servers and multi-core computers with tons of RAM, and lots of L2 CACHE, but in small devices and toys that are part of a happy meal.

But I hear you, different programming paradigms for different environments. Most likely I won't even use strings of any kind in those type of devices.