Usage of unsafe to create a function that get or insert into a Hashmap

:wave: Hey there
I am struggling to find out what is the right solution for the following problem:

I have a Container struct that hold a HashMap of Element.
I want to implement a get_or_make_element method that will either return an existing reference to the map or ask the container to create a new Element, insert it, and return a reference to it.

I can do that in safe rust by using a HashMap<String, Rc>, but if I want to avoid RC, the only way I could make it work is by using raw pointer and unsafe deferencing it.

Should I stick to RC here, is there a more Rust idiomatic version I could use or what kind of guarantee do I need to make on my struct to guarantee that the unsafe part is sound?

fn get_or_make_element(&mut self, k: &str) -> &Element {
    // return a reference to the Element if it exist

    // This does not compile because we are borrowing `self.items` later
    // if let Some(v) = self.items.get(k) {
    //     return v;
    // }

    // unsafe workaround
    let this = self as *const Container;
    if let Some(v) = unsafe { (*this).items.get(k) } {
        return v;
    }

    // ask the container to make a new Element, insert it and return a reference
    let v = self.make_element();
    let entry = self.items.entry(k.to_string()).insert(v);
    return entry.into_mut();
}

playground link: Rust Playground

Yes, your unsafe code is correct.

The safe version fails because of a limitation in the current borrow checker. This will be fixed in some future version of Rust. Your safe code compiles successfully in the nightly toolchain when the experimental Polonius borrow checker is enabled.

You could also use this alternate safe version, but it requires an extra lookup in the case where the key is already in the map:

    fn get_or_make_element(&mut self, k: &str) -> &Element {
        if self.items.contains_key(k) {
            return &self.items[k];
        }

        let v = self.make_element();
        let entry = self.items.entry(k.to_string()).insert(v);
        return entry.into_mut();
    }
``
1 Like

Thanks a lot for your quick reply @mbrubeck
will keep an eye on the Polonius feature

If the make_element logic doesn't depend on the items then you split the borrow and avoid the double lookup. But the entry api requires an owned value so there's still some cost with cloning the string (probably worse than a double lookup). I think the nightly hash_raw_entry feature can work around that. You could also switch from String to Rc or Arc for the hash_map key so an advanced user can potentially use cheap clones of those. When using reference counts the API can be made a little nicer to call with the Into trait, but the function signature may be a bit confusing if the users are new to Rust, and if they never clone the (A)Rc it'll just be a needless complication.

Here's an example of both ideas together. @mbrubeck's solution is probably more practical, but this allows for more optimization at the cost of a clunkier API.

struct Container2 {
    items: HashMap<Rc<str>, Element>,
    maker: Maker,
}

struct Maker;

impl Maker {
    fn make_element(&mut self) -> Element {
        Element::from("hello")
    }
}

impl Container2 {
    fn new() -> Self {
        Self {
            items: HashMap::new(),
            maker: Maker,
        }
    }

    fn get_or_make_element(&mut self, k: impl Into<Rc<str>>) -> &Element {
        // The .. covers any new fields that get added to Self in the future
        let Self {items, maker, ..} = self;
        items.entry(k.into()).or_insert_with(|| maker.make_element())
    }
}
1 Like

Thanks :pray:t2: for all these helpful info

What’s the advantage of this syntax vs using self.items and self.maker here?

That turns the one borrow against self into two separate ones at the field level. Using self.* becomes a problem in the closure for or_insert_with since the entry is holding a &mut against self (via self.items), you can't have another & via self.maker at the same time.

In the future 2021 edition you should be able to use self.* because of RFC 2229 that RFC has to be in an edition change since it can change drop order.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.