Internal mutability of objects in containers

p.s. I tried to read may topics on this but can't seem to figure out. (:warning: lot of code below :smiley: )

Background: I am implementing a HashGraph<K, V> data structure (so entry to graph is O(1))

  • I am using a custom FixedSizeHashMap<K, V> that is implemented internally using Vec<Slot<K, V>> that allows me to get &V/&mut V using both K and the usize index into the Vec.
  • In my HashGraph, I store the graph as FixedSizeHashMap<K, Node<V>>
  • I use indexes rather than K to point to connected nodes as I can avoid repeated hash calculations and save memory when K is huge. Anyway, for my issue it is not relevant as the problem occurs either ways.
  • I have successfully implemented methods such as insert(&'a mut self, key_value: (K, V), connections: Vec<(K, V)>) , and connect(&mut self, k: &K, to_keys: Vec<&K>) however, I'm having the dreaded [E0499] errors when I try to implement remove which should disconnect the edges and removes the node.

The issue is: I need to mutate multiple objects in the same collection 'simultaneously' and of course the borrow checker cant be sure they are not same nodes

here's the code:

struct Node<V> {
    _value: V,
    _connected_to: FixedSizeHashMap<usize, u32>, // stores index (into _hash_map) of nodes and weights that I connect to 
    _connected_from: FixedSizeHashSet<usize>, // stores index (into _hash_map) nodes that connect to me
}

struct FixedSizeHashGraph<K, V> {
    _hash_map: FixedSizeHashMap<K, Node<V>,
    ....
}

impl <K, V> FixedSizeHashGraph<K, V> {
....
    fn remove(&mut self, key: &K) {
        // mut borrow occurs on node being removed     vvvvvvvvvvvvvv
        let Some((node, index)) = self._hash_map._get_mut_value_and_index_of(key) else {
            return
        };

        // remove all the connections/edges
        for (to_index, _) in node._connected_to.iter_head() {
            // mut borrow occurs on connected_to nodes   vvvvvvvvvvvvvv [E0499]
            let Some(to_node) = self._hash_map._get_mut_value_at(*to_index) else {
                continue;
            };
            to_node._connected_from.remove(&index);
        }

        for (from_index, _) in node._connected_from.iter_head() {
            // mut borrow occurs on connected_from nodes   vvvvvvvvvvvvvv [E0499]
            let Some(from_node) = self._hash_map._get_mut_value_at(*from_index) else {
                continue;
            };
            from_node._connected_to.remove(&index);
        }

        // finally remove the node
        self._hash_map.remove(&key);
    }
....

Workaround: I have found the following workaround where I copy all the indexes of the nodes I need manipulate and then do the actual work. Here is what works, but this seems like a huge overhead to make "unnecessary" copies of data and unnecessary iterations:

    fn remove(&mut self, key: &K) {
        // collect all the indexes into _hash_map to nodes we will update
        let (index, to_indexes, from_indexes): (usize, Vec<usize>, Vec<usize>) =
            match self._hash_map._get_mut_value_and_index_of(key) {
                Some((node, index)) => {
                    (
                        index,
                        node._connected_to.iter_head().map(|kv| *kv.0).collect(),
                        node._connected_from.iter_head().map(|kv| *kv.0).collect()
                    )
                },
                None => return,
            };
        // update nodes to remove all connections/edges
        for to_index in to_indexes {
            let Some(to_node) = self._hash_map._get_mut_value_at(to_index) else {
                continue;
            };
            to_node._connected_from.remove(&index);
        }

        for from_index in from_indexes {
            let Some(from_node) = self._hash_map._get_mut_value_at(from_index) else {
                continue;
            };
            from_node._connected_to.remove(&index);
        }

        //finally remove the node
        self._hash_map.remove(&key);
    }

Question: is there a better faster/more memory efficient way to do this? I'm not sure how and if RefCell can help and what's the overhead.

Yes, I think RefCell may work here. It allows you to get shared (&) references from the map, which don't have to be exclusive, and mutate them. The cost is the size of the ref count and a (very low cost) increment/decrement when accessed.

If you have more questions, it would help me to have stubs for FixedHashMap/FixedHashSet and the methods you're using here, to get the types right in trying out ideas.

hi @jumpnbrownweasel thanks for the message and offering to help. :folded_hands:

Yes I have just uploaded the project on github here: nilaymk/rust-hash_collections

p.s. it's a complete toy project at the moment that I am using to learn Rust.

The files of interest are:

In some cases it's possible to ensure that access via multiple references is safe, e.g. iter_mut() knows by construction that it gives completely separate &mut references, so it just uses unsafe internally to split the access.

However, in your API deleting a node by &K would be unsafe (if conflict with &mut self didn't block that), because it means the caller may still have access to the K after the deletion, and it could be the same K you've just deleted.

You could take *const K in a &mut self method, so it would be unsafe to dereference the pointer, but it wouldn't keep the conflicting loan alive.

There's also a more complicated approach used by GhostCell to avoid overhead of RefCell, but it works for mutation of individual nodes and for append-only data structures. It wouldn't make deletion by key reference safe.

1 Like

I tried the change to use RefCell and it was very straightforward, although who knows if it will hold up as you do more operations.

I used the first version of remove in your OP and made the changes needed to support RefCell. Notice that _hash_map is accessed by shared reference (&) to access multiple Nodes at once. Then borrow_mut() is called each time we access the Node mutably, to minimize the duration of the mutable borrow and avoid borrowing conflicts.

The main change is just:

-    _hash_map: FixedSizeHashMapImpl<K, Node<V>, C, H>,
+    _hash_map: FixedSizeHashMapImpl<K, RefCell<Node<V>>, C, H>,
full diff (has some unnecessary changes from the IDE)
diff --git a/libs/hash_collections/src/hash_graph.rs b/libs/hash_collections/src/hash_graph.rs
index a8e1137..7009f91 100644
--- a/libs/hash_collections/src/hash_graph.rs
+++ b/libs/hash_collections/src/hash_graph.rs
@@ -1,6 +1,13 @@
-use std::{hash::{Hash, Hasher}, ops::Index};
-use crate::{hash_map::{FixedSizeHashMapImpl, OutOfCapacityError}, FixedSizeHashMap, FixedSizeHashSet};
+use crate::{
+    FixedSizeHashMap, FixedSizeHashSet,
+    hash_map::{FixedSizeHashMapImpl, OutOfCapacityError},
+};
 use std::mem;
+use std::{
+    cell::RefCell,
+    hash::{Hash, Hasher},
+    ops::Index,
+};
 
 use crate::check::{Check, IsTrue, is_prime_and_within_limit};
 
@@ -21,7 +28,7 @@ where
     K: Hash + std::cmp::Eq,
     H: Default + Hasher,
 {
-    _hash_map: FixedSizeHashMapImpl<K, Node<V>, C, H>,
+    _hash_map: FixedSizeHashMapImpl<K, RefCell<Node<V>>, C, H>,
 }
 
 impl<'a, K: 'a, V: 'a, const C: usize, H> FixedSizeHashGraphImpl<K, V, C, H>
@@ -30,107 +37,128 @@ where
     K: Hash + std::cmp::Eq,
     H: Default + Hasher,
 {
-    fn _insert_or_update_contained_value(&'a mut self, key: K, value: V) -> Result<usize, OutOfCapacityError>  {
+    fn _insert_or_update_contained_value(
+        &'a mut self,
+        key: K,
+        value: V,
+    ) -> Result<usize, OutOfCapacityError> {
         match self._hash_map._get_mut_value_and_index_of(&key) {
             Some((node, index)) => {
-                let _ = mem::replace(&mut node._value, value);
-                return Result::Ok(index)
+                let _ = mem::replace(&mut node.borrow_mut()._value, value);
+                return Result::Ok(index);
             }
             None => {
-                match self._hash_map._insert_get_index(key, Node {
-                    _value: value,
-                    _connected_to: Default::default(),
-                    _connected_from: Default::default()
-                }) {
+                match self._hash_map._insert_get_index(
+                    key,
+                    RefCell::new(Node {
+                        _value: value,
+                        _connected_to: Default::default(),
+                        _connected_from: Default::default(),
+                    }),
+                ) {
                     Ok((index, _)) => return Ok(index),
-                    Err(e) => return Err(e)
+                    Err(e) => return Err(e),
                 }
             }
         }
     }
 
-    pub fn insert(&'a mut self, key_value: (K, V), connections: Vec<(K, V)>) -> Result<(), OutOfCapacityError> {
+    pub fn insert(
+        &'a mut self,
+        key_value: (K, V),
+        connections: Vec<(K, V)>,
+    ) -> Result<(), OutOfCapacityError> {
         let Ok(index) = self._insert_or_update_contained_value(key_value.0, key_value.1) else {
-            return Result::Err(OutOfCapacityError{capacity: self._hash_map.capacity()});
+            return Result::Err(OutOfCapacityError {
+                capacity: self._hash_map.capacity(),
+            });
         };
 
         for (to_key, to_value) in connections {
             let Ok(to_index) = self._insert_or_update_contained_value(to_key, to_value) else {
-                return Result::Err(OutOfCapacityError{capacity: self._hash_map.capacity()});
+                return Result::Err(OutOfCapacityError {
+                    capacity: self._hash_map.capacity(),
+                });
             };
-            
-            if index == to_index {continue;}
+
+            if index == to_index {
+                continue;
+            }
 
             if let Some(node) = self._hash_map._get_mut_value_at(index) {
-                match node._connected_to.get_mut(&to_index) {
-                    Some(edge_weight) => {*edge_weight+=1;},
-                    None => {let _ = node._connected_to.insert(to_index, 1);}
+                match node.borrow_mut()._connected_to.get_mut(&to_index) {
+                    Some(edge_weight) => {
+                        *edge_weight += 1;
+                    }
+                    None => {
+                        let _ = node.borrow_mut()._connected_to.insert(to_index, 1);
+                    }
                 }
             }
-            
+
             if let Some(to_node) = self._hash_map._get_mut_value_at(to_index) {
-                let _ = to_node._connected_from.insert(index, ());
+                let _ = to_node.borrow_mut()._connected_from.insert(index, ());
             }
         }
 
-        return Ok(())
+        return Ok(());
     }
 
     fn connect_to(&mut self, k: &K, to_keys: Vec<&K>) {
         let Some(index) = self._hash_map._get_index_of(k) else {
-            return
+            return;
         };
-        
+
         for to_key in to_keys {
             let Some(to_index) = self._hash_map._get_index_of(to_key) else {
                 continue;
             };
             if index != to_index {
                 if let Some(node) = self._hash_map._get_mut_value_at(index) {
-                    let _ = node._connected_to.insert_or(to_index, 1, |weight| *weight+=1);
+                    let _ = node
+                        .borrow_mut()
+                        ._connected_to
+                        .insert_or(to_index, 1, |weight| *weight += 1);
                 }
                 if let Some(to_node) = self._hash_map._get_mut_value_at(to_index) {
-                    let _ = to_node._connected_from.insert(index, ());
+                    let _ = to_node.borrow_mut()._connected_from.insert(index, ());
                 }
             }
         }
     }
 
     fn remove(&mut self, key: &K) {
-        let (index, to_indexes, from_indexes): (usize, Vec<usize>, Vec<usize>) =
-            match self._hash_map._get_mut_value_and_index_of(key) {
-                Some((node, index)) => {
-                    (
-                        index,
-                        node._connected_to.iter_head().map(|kv| *kv.0).collect(),
-                        node._connected_from.iter_head().map(|kv| *kv.0).collect()
-                    )
-                },
-                None => return,
-            };
-        
-        for to_index in to_indexes {
-            let Some(to_node) = self._hash_map._get_mut_value_at(to_index) else {
+        // mut borrow occurs on node being removed     vvvvvvvvvvvvvv
+        let Some((node, index)) = self._hash_map._get_val_and_index_of(key) else {
+            return;
+        };
+
+        // remove all the connections/edges
+        for (to_index, _) in node.borrow()._connected_to.iter_head() {
+            // mut borrow occurs on connected_to nodes   vvvvvvvvvvvvvv [E0499]
+            let Some(to_node) = self._hash_map._get_value_at(*to_index) else {
                 continue;
             };
-            to_node._connected_from.remove(&index);
+            to_node.borrow_mut()._connected_from.remove(&index);
         }
 
-        for from_index in from_indexes {
-            let Some(from_node) = self._hash_map._get_mut_value_at(from_index) else {
+        for (from_index, _) in node.borrow()._connected_from.iter_head() {
+            // mut borrow occurs on connected_from nodes   vvvvvvvvvvvvvv [E0499]
+            let Some(from_node) = self._hash_map._get_value_at(*from_index) else {
                 continue;
             };
-            from_node._connected_to.remove(&index);
+            from_node.borrow_mut()._connected_to.remove(&index);
         }
 
-        self._hash_map.remove(&key);
+        // finally remove the node
+        self._hash_map.remove(key);
     }
 
     // fn disconnect_from(&mut self, key: &K, to_keys: Vec<&K>) {
     //     let Some((node, index)) = self._hash_map._get_mut_and_index(key) else {
     //         return
     //     };
-        
+
     //     for to_key in to_keys {
     //         let Some((to_node, to_index)) = self._hash_map._get_mut_and_index(to_key) else {
     //             continue;
@@ -150,4 +178,4 @@ where
     // fn disconnect_all() {
 
     // }
-}
\ No newline at end of file
+}
diff --git a/libs/hash_collections/src/hash_map.rs b/libs/hash_collections/src/hash_map.rs
index 9152b7b..14a5f3e 100644
--- a/libs/hash_collections/src/hash_map.rs
+++ b/libs/hash_collections/src/hash_map.rs
@@ -1,18 +1,24 @@
+use std::error;
+use std::fmt;
 use std::hash::{DefaultHasher, Hash, Hasher};
 use std::marker::PhantomData;
 use std::mem;
 use std::ops::{Index, IndexMut};
-use std::error;
-use std::fmt;
 
 use crate::check::{Check, IsTrue, is_prime_and_within_limit};
 
 #[derive(Debug, Clone, PartialEq)]
-pub struct OutOfCapacityError {pub capacity: usize}
+pub struct OutOfCapacityError {
+    pub capacity: usize,
+}
 
 impl fmt::Display for OutOfCapacityError {
     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
-        write!(f, "HashMap has reached its capacity of {} entries", self.capacity)
+        write!(
+            f,
+            "HashMap has reached its capacity of {} entries",
+            self.capacity
+        )
     }
 }
 
@@ -69,8 +75,6 @@ where
     _phantom: PhantomData<H>,
 }
 
-
-
 impl<K, V, const C: usize, H> Default for FixedSizeHashMapImpl<K, V, C, H>
 where
     Check<{ is_prime_and_within_limit(C, 25013) }>: IsTrue,
@@ -103,10 +107,16 @@ where
         map
     }
 
-    pub(crate) fn _insert_get_index(&mut self, key: K, value: V) -> Result<(usize, Option<V>), OutOfCapacityError> {
+    pub(crate) fn _insert_get_index(
+        &mut self,
+        key: K,
+        value: V,
+    ) -> Result<(usize, Option<V>), OutOfCapacityError> {
         let i = self._find_index(&key, false);
         if i == Self::CAPACITY {
-            return Result::Err(OutOfCapacityError{capacity: Self::CAPACITY});
+            return Result::Err(OutOfCapacityError {
+                capacity: Self::CAPACITY,
+            });
         }
 
         let mut old_val: Option<V> = None;
@@ -136,7 +146,12 @@ where
         result.map(|rv| rv.1)
     }
 
-    pub fn insert_or<F: FnOnce(&mut V)>(&mut self, key: K, value: V, op: F) -> Result<(), OutOfCapacityError> {
+    pub fn insert_or<F: FnOnce(&mut V)>(
+        &mut self,
+        key: K,
+        value: V,
+        op: F,
+    ) -> Result<(), OutOfCapacityError> {
         if let Some(value) = self.get_mut(&key) {
             op(value);
             Ok(())
@@ -147,13 +162,14 @@ where
 
     pub fn exists(&self, key: &K) -> bool {
         let i = self._find_index(key, true);
-        i != Self::CAPACITY && self._data[i].is_occupied() 
+        i != Self::CAPACITY && self._data[i].is_occupied()
     }
 
     pub fn get(&self, key: &K) -> Option<&V> {
         let i = self._find_index(key, true);
 
-        if i != Self::CAPACITY && let Slot::IsOccupiedBy(ref entry) = self._data[i]
+        if i != Self::CAPACITY
+            && let Slot::IsOccupiedBy(ref entry) = self._data[i]
         {
             Some(&entry.value)
         } else {
@@ -164,7 +180,8 @@ where
     pub fn get_mut(&mut self, key: &K) -> Option<&mut V> {
         let i = self._find_index(key, true);
 
-        if i < Self::CAPACITY && let Slot::IsOccupiedBy(ref mut entry) = self._data[i]
+        if i < Self::CAPACITY
+            && let Slot::IsOccupiedBy(ref mut entry) = self._data[i]
         {
             Some(&mut entry.value)
         } else {
@@ -174,13 +191,19 @@ where
 
     pub(crate) fn _get_index_of(&mut self, key: &K) -> Option<usize> {
         let i = self._find_index(key, true);
-        if i != Self::CAPACITY && self._data[i].is_occupied() {Some(i)} else {None}
+        if i != Self::CAPACITY && self._data[i].is_occupied() {
+            Some(i)
+        } else {
+            None
+        }
     }
 
     pub(crate) fn _get_mut_value_and_index_of(&mut self, key: &K) -> Option<(&mut V, usize)> {
         let i = self._find_index(key, true);
-        if i != Self::CAPACITY && let Slot::IsOccupiedBy(ref mut entry) = self._data[i] {
-            Some ((&mut entry.value, i))
+        if i != Self::CAPACITY
+            && let Slot::IsOccupiedBy(ref mut entry) = self._data[i]
+        {
+            Some((&mut entry.value, i))
         } else {
             None
         }
@@ -188,8 +211,10 @@ where
 
     pub(crate) fn _get_val_and_index_of(&self, key: &K) -> Option<(&V, usize)> {
         let i = self._find_index(key, true);
-        if i != Self::CAPACITY && let Slot::IsOccupiedBy(ref entry) = self._data[i] {
-            Some ((&entry.value, i))
+        if i != Self::CAPACITY
+            && let Slot::IsOccupiedBy(ref entry) = self._data[i]
+        {
+            Some((&entry.value, i))
         } else {
             None
         }
@@ -231,7 +256,9 @@ where
         self._get_key_val_at(self._tail)
     }
 
-    pub fn iter_head(&self) -> MapIterator<'_, K, V, C, impl Fn(&Entry<K, V, C>) -> usize + use<K, V, C, H>> {
+    pub fn iter_head(
+        &self,
+    ) -> MapIterator<'_, K, V, C, impl Fn(&Entry<K, V, C>) -> usize + use<K, V, C, H>> {
         MapIterator {
             _remaining: self._size,
             _current: self._head,
@@ -240,7 +267,9 @@ where
         }
     }
 
-    pub fn iter_tail(&self) -> MapIterator<'_, K, V, C, impl Fn(&Entry<K, V, C>) -> usize + use<K, V, C, H>> {
+    pub fn iter_tail(
+        &self,
+    ) -> MapIterator<'_, K, V, C, impl Fn(&Entry<K, V, C>) -> usize + use<K, V, C, H>> {
         MapIterator {
             _remaining: self._size,
             _current: self._tail,
@@ -338,8 +367,18 @@ where
         }
     }
 
+    pub(crate) fn _get_value_at(&self, i: usize) -> Option<&V> {
+        if i < Self::CAPACITY
+            && let Slot::IsOccupiedBy(ref entry) = self._data[i]
+        {
+            Some(&entry.value)
+        } else {
+            None
+        }
+    }
+
     pub(crate) fn _get_mut_value_at(&mut self, i: usize) -> Option<&mut V> {
-            if i < Self::CAPACITY
+        if i < Self::CAPACITY
             && let Slot::IsOccupiedBy(ref mut entry) = self._data[i]
         {
             Some(&mut entry.value)

EDIT: You can also change all the _hash_map._get_mut_value_at calls to _get_value_at. And change all _hash_map._get_mut_value_and_index_of calls to _get_val_and_index_of. There is no need to get a &mut reference to the RefCell<Node>.

Alternatively, when you only need to access one Node at a time, you can use the original methods to get a &mut reference to the RefCell<Node>, and then call RefCell::get_mut (instead of borrow_mut) to access the node. This skips the incrementing and decrementing of the ref count.

1 Like

Thanks @kornel for the feedback. My remove interface is similar to the std::collections::hash_map::remove, which could also end up with a &K, but the borrow checker does the right thing:

use std::collections::HashMap;

fn main() {
    let mut map = HashMap::new();
    map.insert("foo".to_string(), 10);
    map.insert("bar".to_string(), 11);    

/*  E0502 !!           --- immutable borrow occurs here */
    let Some((k, _)) = map.get_key_value(&"foo".to_string()) else {
        return;
    };

    map.remove(k); // <== try to remove using &K owned internally by map
/*   |   |
     |   |   immutable borrow later used by call
     |  mutable borrow occurs here
*/
}

Of course, I'll test my FixedSizeHashMap for this... especially if I use RefCell.

Thanks @jumpnbrownweasel. :folded_hands: I'll try out your suggestions as I grow and test the graph code over the weekend and let you know how it went.

1 Like