Is it not safe to use &String or &str for key of HashMap?

Hi, I'm new to Rust. I'm confused about the mutability of string(s).
I have experience with Java and Golang, in which String is basicly not mutable.
But Rust makes String mutable, which may lead to key insertion and query inconsistency in HashMap.

An overview of the following code:

  1. Create a HashMap
  2. Insert a string reference (&"hello") as a key into it
  3. Modify the string (to "hollo") in a less elegant way (I know it's not elegant and not recommended, but it's possible)
  4. Query from the HashMap (by "hello" or "hollo") , but the key-value pair cannot be found

I known the reason why this happens, I just wanna figure it out: is &String or &str not safe enough to be used as key of HashMap? especially with "malicious code".
As a comparison, this situation will not (or less likely) happen in Java, Golang.

Use &String as key of HashMap

// 在使用 &String 作为 HashMap 的 key 时,可能因为 String 的可变性而不安全。
// 如果用 String 作为 key 则没有问题。
fn unsafe_use_string_ref_key() {
    println!("==unsafe_use_string_ref_key==");
    use std::collections::HashMap;
    let mut scores: HashMap<&String, i32> = HashMap::new();
    let str1 = String::from("hello");
    let ref1 = &str1;
    let ref2 = &str1;
    scores.insert(ref1, 5);
    scores = dbg!(scores);

    // 把不可变引用强转为可变引用
    let ref3 = ref_to_mut(ref1);
    ref3.replace_range(1..2, "o");
    println!("ref2 is {}", ref2);
    // 修改后,HashMap 中的 key 也变成了 hollo,但是用 hello、hollo 都不能从 HashMap 中查出键值对。
    // 原因是,insert 时把键值对放到了 hash("hello") 的桶,随后 key 变成了 hollo 但还在原来 hash("hello") 的桶,
    // 用 hello 查,hash("hello") 的桶里没有 hello 这个key;
    // 用 hollo 查,hash("hollo") 的桶里根本没有数据。
    scores = dbg!(scores);
    println!(
        "{}, {}",
        scores.get(&String::from("hello")).unwrap_or(&0),
        scores.get(&String::from("hollo")).unwrap_or(&0)
    );
}

#[allow(mutable_transmutes)]
fn ref_to_mut<T: ?Sized>(ref1: &T) -> &mut T {
    unsafe { mem::transmute(ref1) }
}

Use &str as key of HashMap

fn unsafe_use_string_ref_key2() {
    println!("==unsafe_use_string_ref_key2==");
    use std::collections::HashMap;
    let mut scores: HashMap<&str, i32> = HashMap::new();
    let str1 = String::from("hello");
    let ref1 = str1.as_str();
    let ref2 = str1.as_str();
    scores.insert(ref1, 5);
    scores = dbg!(scores);

    // 把不可变引用强转为可变引用
    let ref3 = ref_to_mut(ref1);
    let bytes = unsafe { ref3.as_bytes_mut() };
    bytes[1] = b'o';
    println!("ref2 is {}", ref2);
    // 修改后,HashMap 中的 key 也变成了 hollo,但是用 hello、hollo 都不能从 HashMap 中查出键值对。
    // 原因是,insert 时把键值对放到了 hash("hello") 的桶,随后 key 变成了 hollo 但还在原来 hash("hello") 的桶,
    // 用 hello 查,hash("hello") 的桶里没有 hello 这个key;
    // 用 hollo 查,hash("hollo") 的桶里根本没有数据。
    scores = dbg!(scores);
    println!(
        "{}, {}",
        scores.get("hello").unwrap_or(&0),
        scores.get("hollo").unwrap_or(&0)
    );
}

You are ignoring the whole raison d'être of the language. And no, String and &str are perfectly fine to be used as HashMap keys.

Your ref_to_mut() function is wildly unsound and causes immediate Undefined Behavior. It is the very point of Rust's aliasing/borrowck model that you are not allowed to mutate stuff through shared references by default, and as a consequence, your code is completely, hopelessly broken.

Map types (including HashMap and BTreeMap) intentionally only hand out shared (i.e., immutabe) references to keys. So you can't really do what you want "by accident".

The only "correct" and memory-safe way to do this is to employ an explicit shared mutability primitive (such as RefCell or Mutex) which grants you the special ability to mutate through a shared reference. But then that's very explicit in the type of the map (because it's going to be HashMap<RefCell<str>, Value> or BTreeMap<Mutex<String>, Value>, etc.), and people will instantly see the anti-pattern.

If you don't use a shared mutability primitive, i.e. literally have HashMap<String, Value>, then it is not allowed and not safely possible to ever mutate the keys.

9 Likes

It's safe to use String as a key to HashMap.

What is not always safe is std::mem::transmute. Hence, the function is marked unsafe, and the language will prevent you from using it unless you tell the compiler, by using the keyword unsafe as you did, that you want to use unsafe tools.

Rust doesn't protect you against malicious code. If you allow malicious code into your binary, there are no guarantees. This has nothing to do with HashMap in particular.

1 Like

As an addition to @H2CO3’s very correct response, it can be interesting to look at the HashMap API provided by the hashbrown crate. This crate offers a set of “raw” APIs that can be used to achieve similar results as when the standard library’s HashMap is used with interior-mutability types as keys like RefCell<String>, or with custom types that have inconsistent Eq or Hash implementations: The HashMap can be brought into an inconsistent state where entries may become impossible or hard to look up, and resulting in a variety of possible (logically) broken behaviors of the map.

The kinds of capabilities these raw APIs can offer include

  • the ability to provide the hash value manually which will of course result in a broken map if the hash value is wrong
  • the ability to get mutable access to the key after all, which can result in the same kind of behavior you observed here, but without the undefined behavior problems others pointed out

This raw API is interesting in particular also because it is mirrored as a (still unstable) API in the standard library, too, so it may become available there as well at some point in the future.

2 Likes

Only because there's much less general reason to use unsafe in either of them. Therefore, much less things that could go as obviously wrong as this transmute.

I have seen this situation plenty of times both in Java and Golang. You just have to keep in mind that Rust with unsafe can not be compared with Java or Golang. You have to compare them with Java + “malicious code” or “Golang + malicious code”. You don't even need that “malicious code” to be written in C or C++ (although that's usually the case), modern OSes allow you to use WriteProcessMemory on Windows or vm_write on MacOS or /proc/dev/mem on Linux…

There no protection against against what you are doing and couldn't be any protection against what you are doing in any low-level language. If everything else fails — just call read and write directly via pipe and change your stings that way.

Rust just a bit honest about what it does, that's all.

Could you give me an example in Java?
String in Java is strictly immutable, I can't think of a way this could happen in Java.

You can use JNI to forcefully overwrite String data, just like you did with unsafe code in Rust. Both are serious bug and may cause security issues of course.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.