Mutable/immutable borrow conflict for "scan for a key to remove from a map" pattern - how to resolve?

The code below won't compile. The problem is straightforward: a call to BTreeMap::remove requires an &mut reference to the map, but max_key holds an immutable reference to one of the map's keys, rendering the map immutable for the duration of that reference.

But I don't see any good way to fix it, because the key is, in my actual code, not Copy (it is Clone, but a clone is quite expensive). And extract_if can't be used because we don't know which element to remove until we've scanned the entire map. So I'm stuck. Any ideas?

use std::collections::BTreeMap;

pub fn pop_max_weight<K: Ord, V, F: Fn(&V) -> usize>(
    map: &mut BTreeMap<K, V>,
    weight: F
) -> Option<V> {
    let mut max_weight = 0usize;
    let mut max_key = None;
    for (k, v) in map.iter() {
        let k_weight = weight(v);
        if k_weight > max_weight {
            max_weight = k_weight;
            max_key = Some(k);
        }
    }
    max_key.and_then(|k| map.remove(k))
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0500]: closure requires unique access to `*map` but it is already borrowed
  --> src/lib.rs:16:22
   |
 9 |     for (k, v) in map.iter() {
   |                   --- borrow occurs here
...
16 |     max_key.and_then(|k| map.remove(k))
   |             -------- ^^^ --- second borrow occurs due to use of `*map` in closure
   |             |        |
   |             |        closure construction occurs here
   |             first borrow later used by call

For more information about this error, try `rustc --explain E0500`.
error: could not compile `playground` (lib) due to 1 previous error

(P.S. I know about BinaryHeap, but in my actual code, the weight function depends on external data that's different every time pop_max_weight is called, so that's no good either.)

I can only think of two inelegant solutions:

  1. Store each key in an Rc or Arc, so that you can clone them cheaply. This affects all other uses of the map. (But if a key is, for example, a Vec<T>, you can change it to an Arc<[T]> and have no additional indirection.)

  2. Remove the item by index, not key. This has to iterate over the map twice instead of once.

    pub fn remove_max_weight<K: Ord, V, F: Fn(&V) -> usize>(
        map: &mut BTreeMap<K, V>,
        weight: F,
    ) -> Option<V> {
        let mut max_weight = 0usize;
        let mut max_index = None;
        for (i, (_, v)) in map.iter().enumerate() {
            let k_weight = weight(v);
            if k_weight > max_weight {
                max_weight = k_weight;
                max_index = Some(i);
            }
        }
        if !max_index.is_some() {
            return None;
        }
        map.extract_if(.., |_k, _v| match max_index {
            Some(0) => {
                max_index = None;
                true
            }
            Some(ref mut i) => {
                *i -= 1;
                false
            }
            None => false,
        })
        .next()
        .map(|(_k, v)| v)
    }
    

the optimal algorithm is unsupported in current BTreeMap API design. it is only possible if the internal representation of the location of the KV entry in the node is exposed to the user, which the standard library doesn't.

you may switch to a third party crate which exposes some form of node entry location/address/cursor type in the API.

but personally, I would stick with the standard library, and just accept the non-optimal solution that I need to clone the key to remove it.

because b-tree is a complex data structure, and the standard library has a farily sophisticated btree implementation, I doubt I could easily find a crate of the same quality.

Could you have a cheaper form of key Q, such that you can go &K -> Q and also K: Borrow<Q>?

Inspired by @kpreid:

pub fn remove_max_weight<K: Ord, V, F: Fn(&V) -> usize>(
    map: &mut BTreeMap<K, V>,
    weight: F,
) -> Option<V> {
    let mut max_weight = 0_usize;
    let mut max_key = None;
    for (k, v) in map.iter() {
        let k_weight = weight(v);
        if k_weight > max_weight {
            max_weight = k_weight;
            // take a `*const K` pointer
            // instead of a `&map` bound `&K` reference
            max_key = Some(k as *const _);
        }
    }

    // safe option
    let Some(key_ptr) = max_key else { return None };
    let _value = map
        .extract_if(.., |k, _| key_ptr == k as _)
        .next()
        .map(|kv| kv.1);

    // do NOT use, see below
    unsafe { map.remove(&*key_ptr) }
}

My only option for that is @kpreid's suggestion of using Rc<K>, or something morally equivalent to Rc<K>. (But that would work, and I may go for that if a potential algorithmic rework that would render this entire thing unnecessary doesn't pan out.)

I thought about this some more and I think this is genuinely a hole in the type system.

In an unchecked language, you could define a BTreeMap::remove that accepted a key reference -- or maybe I should call it a pointer -- pointing into the map from which the key was to be removed. There is no obstacle to making the internals of the map handle this case correctly. The documentation would say that if you do this, the key reference is invalidated by the operation, and it would be the programmer's responsibility to not use that reference again afterward.

But in safe Rust there is no way to define a function that takes a mutable reference and one or more immutable references, and if the immutable references refer to the same object as the mutable reference, that's fine, but those references are consumed by the operation. References are Copy and you can't declare a function that moves references out of its arguments. That's the hole.

Hypothetically we could patch this hole, something like

impl BTreeMap {
    pub fn remove<Q>(&mut self, key: move &Q) -> Option<V>
                                  // ^^^^
    where
        K: Borrow<Q> + Ord,
        Q: Ord + ?Sized,

For backward compatibility, though, either we'd have to add a new method, which would be unergonomic, or the rule would have to be that a move &T argument moves rather than copying only when copying would mean it conflicts with a &mut borrow argument to the same function, which might be a little too spukhafte Fernwirkung for an official language feature. I dunno. What do yinz think?

[EDIT: Per discussion below, I want to clarify that I am not saying all we'd have to do is add this "move &T" feature for the existing BTreeMap::remove to handle being called this way. We would also have to make sure the internals were sound when called this way; after which, adding move to the function signature would be the way the rest of the world (in particular the borrow checker) knew it was sound to call this way.]

This is unsound because, at least, there is no guarantee the BTreeMap implementation will not look at the reference after performing the removal (or perhaps even rebalance the tree before finishing using the reference, moving the key).

Your unsafe solution is almost surely UB. In particular, a reference passed to a function must be valid for the entire body of that function. However, remove surely invalidates any references to the removed key.

It doesn’t even need to look at the reference for it to be UB, AFAIK.

Good catch. Leaving the code as-is, for future reference.

I think you can do it with a cursor, compute the key the same way, then use upper_bound_mut or lower_bound_mut and then remove_next ( well something like that ).

If you want to avoid unstable rust you can use pstd::collections::btree_map - Rust

That is why I started making this, to be able to use cursors on stable.

[ Oh, I think I am wrong, upper_bound_mut takes a &mut BTreeMap... sorry, I don't think it works, although maybe you could get away with the dirty pointer trick, but it is probably unsound and I don't think miri would like it. upper_bound_mut doesn't actually mutate the map, it is just doing positioning. It gets mutated after that returns. ]

I guess you could find the position using a cursor, then count from one end or the other ( whichever is closer ) to do the remove, a bit like using extract_if. extract_if is probably faster though I think. But cloning the key is probably best, if this is a frequent operation. Or use a different data structure entirely, like some kind of heap.

Cloning a single key expensive relative to scanning the whole BTreeMap? What's an example of that?

Further general advice: Correct use of raw pointers to create references usually looks something like: the thing you want to do could have been done by magically knowing up front what borrows you need to take and creating them using some combination of the borrow-splitting operations the language and standard library already provide (iter_mut(), split_at_*(), borrowing individual fields or array elements) if you only had memory to store those references and magical time-traveling knowledge of which ones you’ll need later.

On the other hand, it’s not going to be sound, in most cases, to create references that can't be made by combining existing splitting operations.

(This is just a heuristic and has both false positives and false negatives.)

I agree with the core idea. I just want to put it another way:

in the current type system, a function's type must be generic over the lifetime of its reference parameters, which means any reference arguments must remain valid across the entire function call.

however, in this remove() scenario, the key does NOT need to be valid for the enitire function, and self does NOT need to be exclusive all the time either.

your idea of a new kind of "consuming" reference may solve part of the problem, but the remove api still requires an exclusive borrow (self) and a shared borrow (the key to be removed) to overlap.

but I think there's a simpler solution for this particular problem within the scope of current type system, with some clever api design.

the key idea is, what if we don't pass the to-be-removed key as a reference? let's call this new api find_and_remove(), and the solution to the original problem could be written with this hypothetical api like this:

map.find_and_remove(|map| {
    let mut max_weight = 0usize;
    let mut max_key = None;
    for (k, v) in map.iter() {
        let k_weight = weight(v);
        if k_weight > max_weight {
            max_weight = k_weight;
            max_key = Some(k);
        }
    }
    max_key
});

this api is completely safe, and the signature and implementation is very straightforward, even trivial I would argue. (below is pseudo code):

impl<K, V> BTreeMap<K, V> {
    fn find_and_remove<F>(&mut self, mut finder: F) -> Option<V>
    where F: for<'a> FnMut(&'a Self) -> Option<&'a K>
    {
        // derive the key inside the function, so the lifetime of the borrowed key
        // can be shorter than the function body
        let key = finder(self)?;
        // use the private cursor/handle type in the standard library
        // no need to exposed this type publicly
        // this could be `unsafe`, but it's implementation detail
        // the public API remains sound
        let cursor = self.root_node.locate(key)?;
        self.root_node.remove_at(cursor)
    }
}

what are the alternatives?

one "obvious" solution is to expose a (non-borrowing) cursor/handle type in the public api.

a non-borrowing cursor is just like integer indices for Vec. e.g. for btree, it can be an array of indices representing the traversal path from the root node down to the target node. (it is also possible to implement as a direct pointer to the target node, similar to C/C++, resulting an unsafe api).

// step 0: find the key as usual
let max_key = ...;

// step 1: lookup the node location by key
// the `cursur` does not borrow `map`, 
let cursor = map.locate(&max_key);

// step 2: remove the entry located at `cursor`
map.remove_at(cursor);

essentially, this is equivalent to the previous mentioned hypothentical find_and_remove() but implementable by the user, thanks to the public cursor type.


for now, we have neither the find_and_remove() api, nor a public cursor type, in the standard library's btree implementation. so either use a different btree library, or accept the suboptimal solution.

This has the same issue, calling those methods requires a &mut self which you cannot have while holding a reference to the key.

Edit: small comment on the crate:

This BTreeMap does not treat a map being out-of-order as unsafe ( since this cannot be guaranteed for all key types in any case )

This doesn't mean you cannot guarantee the property only for all types which have a well behaved Ord implementation.

Yes, I edited my reply. Something that could possibly help would be a cursor method to skip forward or back by n items, that could allow rapid positioning to perform the remove ( once the position is known ).

i believe that providing a safe version of what you suggest would be quite technically challenging and make quite a messy language.
Alias Xor mutability seems to really be the only approach that can provide safe and readable code

Then you need an augmented tree storing informations on the size of each subtree

To go as fast as possible, yes.

Another solution might be to have a "Position" type, where you can get a Position value from a key reference, then use it to get a cursor ( or just have a remove method that takes a Position ).

The Position would be a list (Vec) of integers, one for each level of the BTreeMap.

A Position would become invalid as soon as a new K,V pair is inserted or removed from the BTreeMap.

pub fn remove_max_weight<K: Ord + Clone, V, F: Fn(&V) -> usize>(
    map: &mut BTreeMap<K, V>,
    weight: F
) -> Option<V> {
    let max_key = map.iter()
        .max_by_key(|(_, val)| weight(val))
        .map(|(key, _)| key.clone()); // Clonamos para liberar el préstamo del mapa

    max_key.and_then(|k| map.remove(&k))
}

I try but I can't do it, and I think is because there is no other way, think this if the data is no store somewhere else, the data is in the map, so you are trying to remove data of the map with a part of the data is store in the same map, and worse you are trying to remove with the data that your erasing, that's UB