HashMap with a field as key

Hi.

I have a struct with, amongst other fields, id : String, and I want to put it in a HashMap based on this field. And I want to do it the “efficient” way: the key in the HashMap must be the actual string in the structure, not a copy. Not because it matters for this project, just to learn.

Obviously, the borrow checker is not happy with the naive version, and rightly so: I could mutate the string in the struct, that would leave me with a dangling pointer. I have managed to get the following almost-minimal example working, but I wonder if I am making idiomatic mistakes.

Any comments would be appreciated, thanks in advance.

#![feature(convert)]

use std::collections::HashMap;

struct Test {
    id : String,
    val : u32,
}

fn test <'a> () -> HashMap <&'a str, Test> {
    let t = Test { id : "foo".to_owned(), val : 42 };
    let mut h : HashMap <&str, Test> = HashMap::new();
    // h.insert(t.id.clone(), t); // works
    let id = t.id.as_str() as *const str;
    unsafe {
        h.insert(&*id, t);
    }
    return h;
}

fn main() {
    let h = test();
    match h.get("foo") {
        Some(x) => print!("found {}\n", x.val),
        None => print!("not found\n"),
    };
}
1 Like

This just bypasses the rules. You seem to want shared ownership of the string and there's Rc for that:

#![feature(convert)]

use std::collections::HashMap;
use std::rc::Rc;

struct Test {
    id : Rc<String>,
    val : u32,
}

fn test() -> HashMap<Rc<String>, Test> {
    let t = Test { id : Rc::new("foo".to_owned()), val : 42 };
    let mut h = HashMap::new();
    h.insert(t.id.clone(), t);
    h
}

fn main() {
    let h = test();
    match h.get("foo") {
        Some(x) => print!("found {}\n", x.val),
        None => print!("not found\n"),
    };
}

There's only a slight problem: h.get("foo") doesn't compile because

the trait collections::borrow::Borrow<str> is not implemented for the type alloc::rc::Rc<collections::string::String>

Unexpectedly Borrow isn't transitive. This seems unfortunate. :confused:

Thanks for the reply, but I do not need reference counting, and therefore, for the sake of learning, I do not want it. I know that the rest of the program is not going to change or delete the string in the id field, and therefore, unless I have misunderstood something very basic about Rust, the pointer to its contents will stay valid.

In C, that works perfectly:

struct test {
    char *id;
};
struct test *t = calloc(1, sizeof(*t)); // FIXME check allocations
t.id = strdup("foo"); // FIXME ditto
hashmap_insert(h, t.id, t);

What pushed me to learn Rust and not one of the countless other languages that flourish nowadays is that it is supposed to give almost as much control as C, and that is what I want to learn.

I don't think there's any idiomatic way of purposely breaking the safety contract. Making the whole test function an unsafe fn would make it more clear that there're some hacks involved.

Absent a specific use case with real trade-offs there doesn't seem to be a lot to learn from this.

What would be the benefit of marking the whole function unsafe? My intuition is that unsafe blocks should be as small as possible, to keep the safety checks for the rest of the code.

The test function returns a tricky HashMap -- we can do certain operations on it that will violate memory safety. Thus test is not encapsulating an unchecked operation inside a safe interface, instead the abstraction is “leaky”. In this situation, it's customary to mark the function unsafe.

1 Like

Another approach is just to split Test into two parts, the key and the value, and insert that pair.

h.insert(t.id, t.val); // works

This explains to Rust that you are allowed to change the value half, but not the key part. On extraction, you can recreate the struct.

There aren't nice automatic methods for doing this, that I know of (you could make a trait, KeyVal with conversions to and from (Key, Val) pairs). But, I do it plenty, works great, etc.

Personal opinion: if you really, really want the key to have a pointer to a field in the value, you'll probably not end up happy with the experience. Rust does give you a bunch of control, but the exchange is that it does insist on some structure. Some of that structure is tedious (often your datastructures need to be laid out per mutability needs rather than by logical function), and ... I suspect that will be iterated on by future versions.

1 Like

Hmmm. I think I mostly get it now.

Assuming I want to keep the key shared and part of the structure itself, and disregarding the safety checks entirely, the program is correct as long as it refrain from doing these two invalid operations:

  1. mutate the id field;

  2. steal the key and value from the HashMap (looks possible with drain, but not for a single key/value pair for now), drop the value and use the key.

I will focus my discourse on 1 for the sake of clarity.

The compiler is not smart enough to see that, it would require global analysis and the safety checks are local. The compiler offers the unsafe keyword to allow the developers to take the burden of the check for themselves. There is a wide range of ways to use unsafe, with a different balance between safety and nimbleness. In other words, I can ask the compiler for as much rope as I want, if I ask for too little I will not be able to tie any know but if I ask for too much I will likely make a tangle and hang myself. I like that, TIMTOWTDI!

For that particular project, not mutating the id field is very easy, it will come naturally because there is absolutely no need to mutate it. I just need to find the correct length of rope to make it clear to the compiler.

  • I can mark everything unsafe, basically turning Rust into a kind of C with strange syntax and implicit free()s. That would not be that bad, but the benefit over directly using C are thin. Well, the macro system is way above the C preprocessor, and a standard library that takes pointer+length for strings instead of pointer-to-0-terminated-buffer is a significant progress, but still.

  • I can mark all places that access the structure containing the HashMap or the member structure mutably unsafe. For this project, that means almost everywhere, so back to the previous point.

  • I can mark the bare minimum unsafe to make rustc happy. That leavers dangerous aliased pointers, and rustc will not warn me if I do something wrong with them. I can live with that, and since it is my project, I can decide if I want to take that risk.

  • I can mark the id field #[unsafe(mut)], telling the compiler that I can only access it mutably in unsafe blocks. No, I can not, it does not exist. But it could make sense.

  • I can seal the structure entirely, and only access it through accessors. That is what a java developer would do. I do not like sealed structures, I do not like to write t.set_x(42) when I could write t.x = 42, I find it more readable. This is a matter of taste, but once again, this is my project, my tastes rule it.

  • I can seal only the idfield, making it private while all the other fields are tagged with pub.

I gather the two last options would be the preferred one for almost everybody, with flame wars between each choice.

That still leaves the question of the extents of the unsafe blocks inside the module implementing the structure. For that, I think I will choose the “make the unsafe blocks as tight as possible” option.

Thanks for the feedback.

So collections::HashMap isn't a good fit for your requirements. You might be able to implement your own hash map that would know how to use a hashable field as a key.

1 Like

HashSet with the key retrieval API (not implemented yet) could be useful.

1 Like

I would be careful reasoning too casually about unsafe code. You should double check many things, most specifically that you are not violating LLVM's aliasing rules by handing out references to data the HashMap thinks it uniquely owns.

1 Like

Yes, RFC: Add item recovery collection APIs does seem relevant.

1 Like