Idiomatic way to remove entries from hashmap by condition and move their values out

Hello.
There is a hash map HashMap::<TaskId, sync::oneshot::Sender<TaskResult>>.
From time to time, some process:

  1. gets all keys from this map
  2. checks their status in external system
  3. if status is known, it sends it via the Sender and removes the entry not to check it again.

But one shot sender moves itself into the method: oneshot::Sender::send(self, value) (it prevents oneshot from being called again)

So, I need to remove entry from the map (the only way to get value itself, as reference isn't enough).

I've come to several approaches, but all of them aren't elegant: I clone all keys, access map two times and so on.

    // For all completed tasks report their status and remove them not to check them again
    // Approach 1:
    for task_id in map.keys().cloned().collect::<Vec<_>>() { // <-- redundant copy
        if let Some(task_result) = get_task_result(task_id) {
            map.remove(&task_id)
                .expect("Can't be") // <-- silly
                .send(task_result)
                .unwrap();
        }
    }

    // Approach 2:
    let completed_tasks: Vec<_> = map
        .keys()
        .filter_map(|&task_id| get_task_result(task_id).map(|result| (task_id, result)))
        .collect(); // <-- redundant copy
    for (task_id, task_result) in completed_tasks {
        map.remove(&task_id)
            .expect("can't be") // <-- Silly
            .send(task_result)
            .unwrap();
    }

There is HashMap::retain of course, but it can't tell in advance if I will remove an entry or not, so I get reference.

There is HashMap:extract_if, but it returns only key (taskId in my case) and value (sender) but doesn't allow me to map sender, so I can't send anything.

I wish there were something like mutable ref iterator over entities, i.e:

for entity in map.entities() {
  let k:&Ket= entity.key();
  // but at the same time
  let v:Value = entitiy.remove_value_and_return_it() 
}

I can solve it with RefCell probably, or with unsafe. But how would you solve it?

Thank you in advance.

PS: One might argue that iterating over keys is a bad idea, so they should better be stored in a vector. But I would need to synchronize both collections and still get value from the map.

I think that the simplest solution is to store Option<Sender> as a value in the map, still use retain, and if you have a value ready to send you use Option::take to take out value from the exclusive reference to the option and store None in its place. This will have minimal runtime overhead, but you would have to do some very weird things if you would ever notice it. This is the cleanest approach and a very idiomatic solution.

use tokio::sync::oneshot;

type TaskId = ();
type TaskResult = ();

type Map = std::collections::HashMap::<TaskId, Option<oneshot::Sender<TaskResult>>>;

fn get_task_result(_: &TaskId) -> Option<TaskResult> {
    None
}

pub fn process(map: &mut Map) {
    map.retain(|key, sender| {
        if let Some(result) = get_task_result(&key) {
            if let Some(sender) = sender.take() {
                let _ = sender.send(result);
            }
            
            false
        } else {
            true
        }
    });
}
3 Likes

Thank you, this one is indeed better. The only thing here is I do not need to match take: unwrap is enough as it never can be None.

1 Like

I would still recommend matching on take. It is correct to panic when you reach illegal state, however in this example if value was already None, then we can just ignore this entry and return false, as it will be removed by retain anyway. If you are worried about correctness you can also check if value is Some before running get_task_result. I would only consider unwrapping value if you could prove with some benchmark that it is more performant. But I find this very unlikely.

Here's a small change. It seems like the cost of creating a channel is just a small Arc allocation, so this might be a small amount better, but mainly it lets you leave the map unchanged.

use tokio::sync::oneshot;

type TaskId = ();
type TaskResult = ();

type Map = std::collections::HashMap<TaskId, oneshot::Sender<TaskResult>>;

fn get_task_result(_: &TaskId) -> Option<TaskResult> {
    None
}

pub fn process(map: &mut Map) {
    map.retain(|key, sender| {
        if let Some(result) = get_task_result(key) {
            let sender = std::mem::replace(sender, oneshot::channel().0);
            sender.send(result).unwrap();
            false
        } else {
            true
        }
    });
}
1 Like

But that would be a mark of a logic error, wouldn't it? Is not it better to fail explicitly if we have such a serious logic error? I do not expect None to be there.

Wow, it looks a little bit hacky, but we save some space in the map. It is 8 for Sender<u32> and 16 for Option<Sender<u32>> on my platform (option seems to be aligned).
Thank you.

1 Like

PS: Btw, that would be UB, right?)

let k = unsafe { std::mem::zeroed::<Sender<ExitCode>>() };
let sender = std::mem::replace(sender, k);

Extremely. Sender is not valid with zeroes, so making k is UB. And if you follow the execution through, this will end up dereferencing the null Arc inside Sender in order to decrement its reference counter when being dropped, which is absolutely not going to work.

The right way to do this optimization is to have a HashMap<TaskId, ManuallyDrop<Sender<TaskResult>>>. Then you can take it out when sending+removing (no need to overwrite it), and implement Drop for the entire map so that it manually drops each Sender.

4 Likes

FYI: Just never call std::mem::zeroed under any circumstance. It's almost always immediate UB.