Mutable/immutable borrow conflict for "scan for a key to remove from a map" pattern - how to resolve?

The code below won't compile. The problem is straightforward: a call to BTreeMap::remove requires an &mut reference to the map, but max_key holds an immutable reference to one of the map's keys, rendering the map immutable for the duration of that reference.

But I don't see any good way to fix it, because the key is, in my actual code, not Copy (it is Clone, but a clone is quite expensive). And extract_if can't be used because we don't know which element to remove until we've scanned the entire map. So I'm stuck. Any ideas?

use std::collections::BTreeMap;

pub fn pop_max_weight<K: Ord, V, F: Fn(&V) -> usize>(
    map: &mut BTreeMap<K, V>,
    weight: F
) -> Option<V> {
    let mut max_weight = 0usize;
    let mut max_key = None;
    for (k, v) in map.iter() {
        let k_weight = weight(v);
        if k_weight > max_weight {
            max_weight = k_weight;
            max_key = Some(k);
        }
    }
    max_key.and_then(|k| map.remove(k))
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0500]: closure requires unique access to `*map` but it is already borrowed
  --> src/lib.rs:16:22
   |
 9 |     for (k, v) in map.iter() {
   |                   --- borrow occurs here
...
16 |     max_key.and_then(|k| map.remove(k))
   |             -------- ^^^ --- second borrow occurs due to use of `*map` in closure
   |             |        |
   |             |        closure construction occurs here
   |             first borrow later used by call

For more information about this error, try `rustc --explain E0500`.
error: could not compile `playground` (lib) due to 1 previous error

(P.S. I know about BinaryHeap, but in my actual code, the weight function depends on external data that's different every time pop_max_weight is called, so that's no good either.)

I can only think of two inelegant solutions:

  1. Store each key in an Rc or Arc, so that you can clone them cheaply. This affects all other uses of the map. (But if a key is, for example, a Vec<T>, you can change it to an Arc<[T]> and have no additional indirection.)

  2. Remove the item by index, not key. This has to iterate over the map twice instead of once.

    pub fn remove_max_weight<K: Ord, V, F: Fn(&V) -> usize>(
        map: &mut BTreeMap<K, V>,
        weight: F,
    ) -> Option<V> {
        let mut max_weight = 0usize;
        let mut max_index = None;
        for (i, (_, v)) in map.iter().enumerate() {
            let k_weight = weight(v);
            if k_weight > max_weight {
                max_weight = k_weight;
                max_index = Some(i);
            }
        }
        if !max_index.is_some() {
            return None;
        }
        map.extract_if(.., |_k, _v| match max_index {
            Some(0) => {
                max_index = None;
                true
            }
            Some(ref mut i) => {
                *i -= 1;
                false
            }
            None => false,
        })
        .next()
        .map(|(_k, v)| v)
    }
    

the optimal algorithm is unsupported in current BTreeMap API design. it is only possible if the internal representation of the location of the KV entry in the node is exposed to the user, which the standard library doesn't.

you may switch to a third party crate which exposes some form of node entry location/address/cursor type in the API.

but personally, I would stick with the standard library, and just accept the non-optimal solution that I need to clone the key to remove it.

because b-tree is a complex data structure, and the standard library has a farily sophisticated btree implementation, I doubt I could easily find a crate of the same quality.

Could you have a cheaper form of key Q, such that you can go &K -> Q and also K: Borrow<Q>?

Inspired by @kpreid:

pub fn remove_max_weight<K: Ord, V, F: Fn(&V) -> usize>(
    map: &mut BTreeMap<K, V>,
    weight: F,
) -> Option<V> {
    let mut max_weight = 0_usize;
    let mut max_key = None;
    for (k, v) in map.iter() {
        let k_weight = weight(v);
        if k_weight > max_weight {
            max_weight = k_weight;
            // take a `*const K` pointer
            // instead of a `&map` bound `&K` reference
            max_key = Some(k as *const _);
        }
    }

    // safe option
    let Some(key_ptr) = max_key else { return None };
    let _value = map
        .extract_if(.., |k, _| key_ptr == k as _)
        .next()
        .map(|kv| kv.1);

    // do NOT use, see below
    unsafe { map.remove(&*key_ptr) }
}

My only option for that is @kpreid's suggestion of using Rc<K>, or something morally equivalent to Rc<K>. (But that would work, and I may go for that if a potential algorithmic rework that would render this entire thing unnecessary doesn't pan out.)

I thought about this some more and I think this is genuinely a hole in the type system.

In an unchecked language, you could define a BTreeMap::remove that accepted a key reference -- or maybe I should call it a pointer -- pointing into the map from which the key was to be removed. There is no obstacle to making the internals of the map handle this case correctly. The documentation would say that if you do this, the key reference is invalidated by the operation, and it would be the programmer's responsibility to not use that reference again afterward.

But in safe Rust there is no way to define a function that takes a mutable reference and one or more immutable references, and if the immutable references refer to the same object as the mutable reference, that's fine, but those references are consumed by the operation. References are Copy and you can't declare a function that moves references out of its arguments. That's the hole.

Hypothetically we could patch this hole, something like

impl BTreeMap {
    pub fn remove<Q>(&mut self, key: move &Q) -> Option<V>
                                  // ^^^^
    where
        K: Borrow<Q> + Ord,
        Q: Ord + ?Sized,

For backward compatibility, though, either we'd have to add a new method, which would be unergonomic, or the rule would have to be that a move &T argument moves rather than copying only when copying would mean it conflicts with a &mut borrow argument to the same function, which might be a little too spukhafte Fernwirkung for an official language feature. I dunno. What do yinz think?

[EDIT: Per discussion below, I want to clarify that I am not saying all we'd have to do is add this "move &T" feature for the existing BTreeMap::remove to handle being called this way. We would also have to make sure the internals were sound when called this way; after which, adding move to the function signature would be the way the rest of the world (in particular the borrow checker) knew it was sound to call this way.]

This is unsound because, at least, there is no guarantee the BTreeMap implementation will not look at the reference after performing the removal (or perhaps even rebalance the tree before finishing using the reference, moving the key).

Your unsafe solution is almost surely UB. In particular, a reference passed to a function must be valid for the entire body of that function. However, remove surely invalidates any references to the removed key.

It doesn’t even need to look at the reference for it to be UB, AFAIK.

Good catch. Leaving the code as-is, for future reference.

I think you can do it with a cursor, compute the key the same way, then use upper_bound_mut or lower_bound_mut and then remove_next ( well something like that ).

If you want to avoid unstable rust you can use pstd::collections::btree_map - Rust

That is why I started making this, to be able to use cursors on stable.

[ Oh, I think I am wrong, upper_bound_mut takes a &mut BTreeMap... sorry, I don't think it works, although maybe you could get away with the dirty pointer trick, but it is probably unsound and I don't think miri would like it. upper_bound_mut doesn't actually mutate the map, it is just doing positioning. It gets mutated after that returns. ]

I guess you could find the position using a cursor, then count from one end or the other ( whichever is closer ) to do the remove, a bit like using extract_if. extract_if is probably faster though I think. But cloning the key is probably best, if this is a frequent operation. Or use a different data structure entirely, like some kind of heap.

Cloning a single key expensive relative to scanning the whole BTreeMap? What's an example of that?

Further general advice: Correct use of raw pointers to create references usually looks something like: the thing you want to do could have been done by magically knowing up front what borrows you need to take and creating them using some combination of the borrow-splitting operations the language and standard library already provide (iter_mut(), split_at_*(), borrowing individual fields or array elements) if you only had memory to store those references and magical time-traveling knowledge of which ones you’ll need later.

On the other hand, it’s not going to be sound, in most cases, to create references that can't be made by combining existing splitting operations.

(This is just a heuristic and has both false positives and false negatives.)