How to prevent closure from "moving" my value?

I'm following an example from a Rust course:

It has the following function, which works fine as long as key and val are of type &str:

fn from(s: &'buf str) -> Self {
    let mut data = HashMap::new();

    for sub_str in s.split('&') {
        let mut key = sub_str;
        let mut val = "";
        /* ... */

        data.entry(key)
            .and_modify(|existing: &mut Value| match existing {
                Value::Single(prev_val) => {
                    *existing = Value::Multiple(vec![prev_val, val]);
                }
                Value::Multiple(vec) => vec.push(val),
            })
            .or_insert(Value::Single(val));
    }

    QueryString { data }
}

Now, I have improved that function to URL-decode key and val. But this means that they are of type String now. The HashMap and Value were changed to store String objects instead of &str slices.

Now I have a problem:

fn from(s: &'buf str) -> Self {
    let mut data = HashMap::new();

    for sub_str in s.split('&') {
        let key: String = create_key();
        let key: String = create_val();

        data.entry(key)
            .and_modify(|existing: &mut Value| match existing {
                Value::Single(prev_val) => {
                    *existing = Value::Multiple(vec![prev_val, val]);
                }
                Value::Multiple(vec) => vec.push(val),
            })
            .or_insert(Value::Single(val)); // <-- ERROR: use of moved value: `val`
    }

    QueryString { data }
}

Why is the value moved, when the closure doesn't use the "move" keyword ???

And, what would be a good way to fix this ???

Using clone() everywhere fixes the issue, but I have a feeling the clone()'s are redundant:

fn from(s: &'buf str) -> Self {
    let mut data = HashMap::new();

    for sub_str in s.split('&') {
        let key: String = create_key();
        let key: String = create_val();

        data.entry(key)
            .and_modify(|existing: &mut Value| match existing {
                Value::Single(prev_val) => {
                    *existing = Value::Multiple(vec![prev_val, val.clone()]); // <-- NOTE: clone() here!
                }
                Value::Multiple(vec) => vec.push(val.clone()), // <-- NOTE: clone() here!
            })
            .or_insert(Value::Single(val));
    }

    QueryString { data }
}

After all, we do either and_modify() or or_insert(), but never both. So "moving" the value into the branch that we actually take should be fine. And a clone() should not be required. But how do this?

Thanks!

The problem here is that the compiler cannot know this by the type signatures of and_modify and or_insert alone. In this case, a straightforward solution is to use a match in the entry enum instead:

impl From<&str> for QueryString {
    fn from(s: &str) -> Self {
        let mut data = HashMap::new();

        for sub_str in s.split('&') {
            let key: String = create_key();
            let val: String = create_val();

            use std::collections::hash_map::Entry::*;
            match data.entry(key) {
                Occupied(mut entry) => {
                    let existing = entry.get_mut();
                    match existing {
                        Value::Single(prev_val) => {
                            *existing = Value::Multiple(vec![mem::take(prev_val), val]);
                        }
                        Value::Multiple(vec) => vec.push(val),
                    }
                }
                Vacant(entry) => {
                    entry.insert(Value::Single(val));
                }
            }
        }

        QueryString { data }
    }
}

Rust Playground

In situations where such a thing isn’t possible, you can still avoid cloning the value by wrapping it into an Option and using .take().unwrap(), effectively turning the fact that only either of the two closures will ever be executed into a run-time checked assertion.

impl From<&str> for QueryString {
    fn from(s: &str) -> Self {
        let mut data = HashMap::new();

        for sub_str in s.split('&') {
            let key: String = create_key();
            let val: String = create_val();
            let mut val = Some(val);

            data.entry(key)
                .and_modify(|existing: &mut Value| match existing {
                    Value::Single(prev_val) => {
                        *existing = Value::Multiple(vec![mem::take(prev_val), val.take().unwrap()]);
                    }
                    Value::Multiple(vec) => vec.push(val.take().unwrap()),
                })
                .or_insert_with(|| Value::Single(val.take().unwrap()));
        }

        QueryString { data }
    }
}
1 Like

This is a property that can't be known solely from those functions' signatures. You need to tell the compiler they're mutually exclusive. How?

use std::collections::hash_map::Entry;

match data.entry(key) {
    Entry::Occupied(mut entry) => match entry.get_mut() {
        Value::SIngle(...) => ...
    }
    Entry::Vacant(entry) => {
        entry.insert(Value::SIngle(val));
    }
}
1 Like

Thanks for the explanation!

Only "doubt" I have is that we are effectively re-implementing the logic that and_modify() and or_insert() already do. Also we are relying on some internals of how HashMap is implemented.

No, this is not relying on any internals. The fact that Entry is an enum is deliberately part of the public API of HashMap. The privacy story of Rust really is to (usually) make it impossible to inadvertently rely on any “internals” that could be subject to change, unless such instability is explicitly mentioned in the documentation.[1]


  1. Examples: Debug or Hash. ↩︎

2 Likes

This seems like a good solution to me :sunglasses:

Just two more questions:

  1. If Option<T>::take() takes the value out of the option, why is unwrap() still needed?
    Yes, I see that the signature is take(&mut self) -> Option<T>
    But how does mapping from Option<T> to Option<T> actually solve anything here? :exploding_head:

  2. What does mem::take() exactly do here? It's description is a bit confusing to me:
    “Replaces dest with the default value of T, returning the previous dest value.”

Okay, I see.

But does the "custom" implementation provide the same "atomicity" guarantees as and_modify()?

pub fn and_modify<F>(self, f: F) -> Selfwhere
Provides in-place mutable access to an occupied entry before any potential inserts into the map.

By the way, just to be clear: I am suggesting that matching on the enum is the superior approach in this case. I am merely showing the alternative solution to be used in other comparable cases without a clean solution.


Option::take is really just a convenience method for std::mem::take on an Option, so by explaining what mem::take(…) does, it should also become clear what .take() does.

mem::take<T: Default>(place: &mut T) is just a convenience shorthand for mem::replace, so mem:.take(r) does the same as mem::replace(r, Default::default()).

Finally, let old_val = mem::replace(place, value); can be thought of as the equivalent as

let old_value = *place;
*place = new_value;

except that this code would not compile.


The practical use of mem::replace is the ability to claim ownership of a value behind a mutable references. You are in principle only borrowing it, you have a &mut T, but you can take the ownership anyways if you – Indiana Jones style – provide an immediate replacement at the same time. In Rust, completely replacing a value is still only considered a form of mutation.

From Push - Learning Rust With Entirely Too Many Linked Lists :

[…] We need some way to get the head without Rust noticing that it's gone. For advice, we turn to infamous Rust Hacker Indiana Jones:

Ah yes, Indy suggests the mem::replace maneuver. This incredibly useful function lets us steal a value out of a borrow by replacing it with another value.


Now this helps avoid the need to proactively move the value into a closure that might never be called. By calling val.take().unwrap() instead of val.unwrap(), we call a (&mut self) method on the val variable, not a (self) method: mutable-reference access in a closure means the closure also only captures the variable by mutable-reference. If the closure is never called, the closure will be dropped, the lifetime of the mutable reference it captured ends, and the original value still exists unharmed and usable afterwards. OTOH, if the closure was called, it leaves the default value of Option<T>, i.e. a None in val.


Similarly, the use of mem::replace on prev_val: we only have &mut … access to prev_val from the entry.get_mut(); we cannot simply move out this owned String from the Value::Single that’s already in the map, at least without providing (temporary) replacement. For String, the empty string is a cheap replacement value. String::default() (equivalently String::new) creates an empty string without any heap allocation yet.

In that use-case, the very next step is the assignment to *existing which replaces the value of the entry a second time, this time the whole enum. The empty string was very short lived, since this assignment will drop it once again, but this approach did make the compiler happy and the overhead is minimal, most importantly we avoided the need for prev_val.clone().

3 Likes

Click the little “source” button in the documentation (to the right of the function signature) and see for yourself:

    pub fn and_modify<F>(self, f: F) -> Self
    where
        F: FnOnce(&mut V),
    {
        match self {
            Occupied(mut entry) => {
                f(entry.get_mut());
                Occupied(entry)
            }
            Vacant(entry) => Vacant(entry),
        }
    }

it does nothing more than a match on the enum itself!

I don’t know what you mean by

"atomicity" guarantees

maybe you’re somehow reading the documentation differently… but anyways, it really simply does the same thing :slight_smile:

Maybe "atomicity" was the wrong word :sweat_smile:

From the description, I assumed that and_modify() would update the value "in-place" in a single self-contained operation, ensuring that it can't "coincide" with potential insert operations.

Kind of what you'd call a "transaction" when it comes to databases.

So I wasn't sure whether manually fetching Entry and manually modifying it would be "safe" here...

(Also, the exact implementation of and_modify() may change in the future?)

In a sense this is trivially true since the entry holds mutable access to the HashMap. So during a call to and_modify, nothing else can happen. In other words, I’m not sure what you would imagine under “insert operations that coincide”.

Yes, such concepts only make sense if there’s any shared access, but we have exclusive access already anyways due to the usage of &mut self in HashMap::entry. The returned Entry in the signature

pub fn entry(&mut self, key: K) -> Entry<'_, K, V>
// equivalently without lifetime elision:
pub fn entry<'a>(&'a mut self, key: K) -> Entry<'a, K, V>

gets its lifetime parameter from the lifetime of the &mut self borrow. So as long as the entry is still around, the original &mut self borrow is kept alive, and exclusive access to the HashMap is guaranteed.

2 Likes

The returned Entry in the signature gets its lifetime parameter from the lifetime of the &mut self borrow. So as long as the entry is still around, the original &mut self borrow is kept alive, and exclusive access to the HashMap is guaranteed.

Okay. Thanks for clarification :wink:

(So there are no scenarios possible, where we'd have a "shared" map, e.g. between multiple threads?)

The straightforward way to share a HashMap between threads is to put it into a Mutex, but the type system & borrow checker would force you to keep that Mutex locked for the entire duration that one Entry handle exists.

Read-only access to a HashMap can even happen actually in parallel across threads without mutual exclusion; of course not with Mutex: the standard library type RwLock is a type that allows dynamically-checked access patterns of either many readers, or an exclusive writer, at each point in time. Same consequence, the RwLock would need to be locked in exclusive/write-mode while you hold onto an Entry. There are also alternatives to HashMap defined in external crates, with the goal to be more efficient than RwLock<HashMap<K, V>> in high-contention parallel access. For example this one. Even those will (necessarily) need to ensure exclusive access to individual map entries when you modify them or access them by mutable reference, so their benefit is mostly a more fine-grained locking behavior, the crate I linked to for example uses a granularity that’s more fine-grained than whole-map, but more coarse than individual-entry.


TL;DR: Yes, there are no scenarios possible. They aren’t calling it “fearless concurrency” without reason.

3 Likes