Iterating over my BTreeMap

So I decided to see if I could make my own implementation of std::collections:BTreeMap, so I could efficiently modify keys while iterating. It is here:

For iteration I have implemented walk and walk_mut which take a closure ( and a start key ). This does what I need for now. However I don't see how to efficiently implement Rust iteration, at least not with map mutation, in safe Rust. For the iterator State I would need a list (stack) of mutable references, and I don't think the borrow checker will allow it ( I am not 100% sure... maybe I could try it and see! ). I suppose I could keep position in the form of the index at each level, which isn't too bad I suppose. Or I could store the whole map in two Vecs, one for leaf nodes, one for non-leaf nodes, in other words an index approach.

Are the above thoughts correct? It seems like there is no "clean" way to implement BTreeMap efficiently (and naturally) in safe Rust. Am I missing anything?

[ I guess I could use Rc/Refcell, but then my map will no longer be Sync+Send ]

Oh, I did refer to this, although I don't understand it all, but it describes an "unsafe" approach: Rust Collections Case Study: BTreeMap

Not in stable Rust, but you could get some inspiration from get_many_mut from HashMap: HashMap in std::collections - Rust

2 Likes

What I discovered is that I have to use unsafe to implement iter_mut at all.

Here is what I have:

impl<'a, K, V> Iterator for IterMut<'a, K, V> {
    type Item = (&'a mut K, &'a mut V);

    fn next(&mut self) -> Option<Self::Item> {
        return if self.ix >= self.map.len() {
            None
        } else {
            let ix = self.ix;
            self.ix = ix + 1;

            // We have to use unsafe to avoid lifetime error.
            // See for example https://smallcultfollowing.com/babysteps/blog/2013/10/24/iterators-yielding-mutable-references/
            // This is ok because we have exclusive access to self.map and the mut references given out do not overlap.

            Some( unsafe { std::mem::transmute(self.map.get_ix_mut(ix)) } )           
        };
    }
}

( From lib.rs - source )

I believe this is sound, if I am mistaken, someone please say!

1 Like

If I understand the code, it's UB. You're handing out a &'a mut InnardsOfBTreeMap while holding on to the entire &'a mut BTreeMap. So you have two active, aliasing &muts.

(Incidentally, have you ran tests using Miri? Miri is the first entity you should ask questions about soundness.)

[1]

It might be simpler to wrap a BTreeMap<UnsafeCell<Key>, Value> and do the immutable to mutable conversion at the edge of the API when sensible.


  1. Also incidentally, I didn't review Niko's article you linked in depth, but it is over a decade old and predates Rust 1.0. ↩︎

4 Likes

Miri is apparently happy with it ( first time I ever used Miri so I may not know what I am doing there ). If you look at the blog, I am doing (pretty much) precisely what is suggested there, as "Option 1" "The solution there would be to use unsafe code:"

https://smallcultfollowing.com/babysteps/blog/2013/10/24/iterators-yielding-mutable-references/

I see your point about the mut reference to the whole map being retained, but I guess the question would be "where is the data race" ? The iterator itself does not ever reference the (key,value) pairs, as it is working with positions (indexes into vectors). I am certainly not claiming I am right here as I am a total novice in this area trying to figure out how things are done. But the std::collections::BTreeMap::iter_mut function has to do precisely the same thing, right? I am just mimicking it, or trying to.

UB as defined by Rust doesn't require a dereference of aliasing mut references, simply existing invalidates the program.

Consider fn foo(a: &mut T, b: &mut T), Rust will assume a and b can't overlap and the compiler may decide to hoist access to either, meaning a later write to b may affect a, which may then violate other optimizations. Given how arbitrarily far such optimizations can reach, Rust simply bans creating aliased mutable references.

UnsafeCell is the only primitive that permits arbitrary unsafe mutation, essentially by suppressing various optimizations. (Many safe APIs like Cell and RefCell are built on it)


I can't speak to the API as a whole, but note that you probably don't want have the Iterator hand out a mutable key, as the user mutating it to change it's relative ordering will violate your invariants. This may or may not be a safety issue, based on what assumptions your unsafe code makes.

1 Like

That is the entire reason I started making my own BTreeMap, I needed to mutate the key ( while not changing the order ), which isn't supported (yet) by the stable BTreeMap.

I agree with what you say about "simply existing", but there are ( I claim ) no aliased mutable references (that is the entire point). Although I suppose that depends on what you mean by "aliased mutable references". The results of the iterator are all mutable references to different memory locations (for want of a better term). There are no other references to these locations, but there is the mutable reference to the entire map ( although that is borrowed as far as the client is concerned ). Do you think the blog ( Iterators yielding mutable references · baby steps Option 1 ) example is UB?

I think it's not really meaningful, given it's from when Rust had a GC, and I think before there was any formal model of reborrows like stacked borrowed.

Note that you can safely write an Iterator in current Rust over mutable items, just not over trees without using indexing or some other way to avoid holding aliased references.

One way to think about it is if it would be legal in safe Rust, ignoring visibility, to get from one mut ref to another, then your refs are aliasing. If Rust rejects the access as already borrowed, then it's a stacked borrow: the inner borrow is essentially treated as just a temporary narrowing of the outer borrow, rather than a duplication of it. But such a relationship is not expressible in current Rust outside the context of a function body, so you can't express that one value in your Iterator is reborrowing another value, so you need to carefully use unsafe features.

You might be ok using raw pointers if you're very careful: *mut T, rather than mut refs, as Rust assumes nothing of them, but they come with their own bag of troubles. In particular it's very easy to accidentally have overlapping derefs, causing the same aliasing refs issue, so you ideally would wrap your usage of them such that they can't hand out references, or maybe to use something like PhantomData<&'a mut T> to tell Rust what you're trying to do. This is getting pretty far into rustinomicon territory, though, and I'm on very shaky ground, most of my unsafe experience is wrapping FFI, not performance hackery.

Well I am using indexing, the get_ix_mut call is implemented in safe Rust, and gets a mutable reference to the ith key-value pair in the map - it starts from the root, nothing every clever about it, except it is somewhat efficient because for each non-leaf sub-tree I have a count field which is the number of keys in the sub-tree. Worst case performance is (tree height) * B, where B is the B in BTree (max number of keys in a node). The difficulty (if it is a difficulty) is adapting this to the Iterator trait to mimic the BTree interface. I think it is actually fine, but on the other hand I could certainly be wrong!

Ah, sorry I didn't look too deep into the code!

Now I've looked closer, I misunderstood what you're trying to do here, and I believe you're hitting exactly a long standing issue that the nomincon specifically covers called "Splitting Borrows". I think you're right that its safe if you can guarantee that you never revisit any item?

1 Like

Yes, the point about the iterator interface is you never revisit an item, if that were not true, Iterator mutation would fall apart. The iterator effectively does a splitting borrow, but not in a single call, it is lazy with each call to next.

The non-overlapping logic seems sound to me, but I wonder about the intermediate operations. In particular when you index into the leaf vector you end up creating a mutable slice reference to all its contents, so you could be indirectly asserting unique access to the previously yielded items, invalidating those references.

In case it helps, I made a cut-down version. The question here is whether MyVec::iter_mut is sound.

#[derive(Debug)]
struct MyVec<T>(Vec<T>);

struct MyMutIter<'a, T> {
    mv: &'a mut MyVec<T>,
    ix: usize,
    end: usize,
}

impl<T> MyVec<T> {
    fn iter_mut(&mut self) -> MyMutIter<T> {
        let end = self.0.len();
        MyMutIter {
            mv: self,
            ix: 0,
            end,
        }
    }
}

impl<'a, T> Iterator for MyMutIter<'a, T> {
    type Item = &'a mut T;

    fn next(&mut self) -> Option<Self::Item> {
        if self.ix == self.end {
            None
        } else {
            self.ix += 1;
            Some(unsafe { std::mem::transmute(&mut self.mv.0[self.ix - 1]) })
        }
    }
}

pub fn main() {
    let mut mv = MyVec(vec![1, 2, 3]);
    for x in mv.iter_mut() {
        *x += 1;
    }
    println!("mv={:?}", mv);
}

Playground link:

If you collect the mutable references instead of only using them one at a time then Miri complains: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8e1c115a450c29d38cf5b6267173caee

The &mut Vec you are using to index into the vector has access to the entire vector, so if there are any &mut referring to individual elements remaining then this violates uniqueness.

4 Likes

If I understand you correctly, well, yes, but there are none so uniqueness is not violated? Is it? But I am far from sure I understand you properly ( also I have almost no idea why Miri is unhappy or why collecting makes any difference.... ! ).

You can lean on the unsafe inside std instead of writing it yourself:

(not fully tested, may contain bugs)

impl<K, T> BTreeMap<K, T>
{
    pub fn iter_mut(&mut self)->Box<dyn '_+Iterator<Item=(&'_ mut K, &'_ mut T)>> {
        self.tree.iter_mut()
    }
}

impl<K,T> Tree<K,T> {
    fn iter_mut(&mut self)->Box<dyn '_+ Iterator<Item=(&'_ mut K, &'_ mut T)>> {
        match self {
            Tree::L(leaf) => Box::new(leaf.iter_mut()),
            Tree::NL(nl) => Box::new(nl.iter_mut()),
        }
    }
}

struct IterLeafMut<'a, K, T> (std::slice::IterMut<'a, (K, T)>);

impl<'a, K, T> Iterator for IterLeafMut<'a, K, T> {
    type Item = (&'a mut K, &'a mut T);
    fn next(&mut self)->Option<Self::Item> {
        let &mut (ref mut k, ref mut v) = self.0.next()?;
        Some((k,v))
    }
}

impl<K, T> Leaf<K,T> {
    fn iter_mut(&mut self)->IterLeafMut<'_, K, T> {
        IterLeafMut(self.0.iter_mut())
    }
}


struct IterNonLeafMut<'a, K, T> {
    v: std::slice::IterMut<'a, (K, T)>,
    c: std::slice::IterMut<'a, Tree<K, T>>,
    current: Box<dyn 'a + Iterator<Item = (&'a mut K, &'a mut T)>>
}

impl<'a, K, T> Iterator for IterNonLeafMut<'a, K, T> {
    type Item=(&'a mut K, &'a mut T);
    fn next(&mut self)->Option<Self::Item> {
        if let Some(x) = self.current.next() { return Some(x); }
        let &mut (ref mut k, ref mut v) = self.v.next()?;
        self.current = self.c.next()?.iter_mut();
        Some((k,v))
    }
}

impl<K, T> NonLeaf<K, T> {
    fn iter_mut(&mut self)->IterNonLeafMut<'_, K, T> {
        let v = self.v.iter_mut();
        let mut c = self.c.iter_mut();
        let current = if let Some(tree) = c.next() {
            tree.iter_mut()
        } else {
            Box::new(std::iter::empty())
        };

        IterNonLeafMut { v, c, current }
    }
}

1 Like

The problematic steps are these:

  • you create a &mut T to the first element;
  • when next() is called again you index into the &mut Vec<T>, creating a &mut [T], which reasserts unique access to all the items in the slice (note that technically just creating the &mut Vec<T> could also trigger this, but AFAIK this is not currently the case in MIRI as it is under debate);
  • when the references previous collected are used again they are no longer valid, since they have been invalidated by the slice assertion, and this triggers MIRI.

The collect part (or just using any reference after next() has been called again) is what triggers the issue because otherwise you could have made a lending iterator (one where the result of next borrows the iterator itself), which is possible in safe code so that must be fine.

2 Likes

This is also possible (even more likely to be buggy than my last post):

impl<K, T> BTreeMap<K,T> {
    pub fn iter_mut(&mut self)->BTreeIterMut<'_, K, T> {
        let mut iter = BTreeIterMut { leaf: &mut [], stack: vec![] };
        iter.push_tree(&mut self.tree);
        iter
    }
}

/// Iterator
pub struct BTreeIterMut<'a, K, T> {
    leaf: &'a mut [(K,T)],
    stack: Vec<IterNonLeafMut<'a, K, T>>
}

impl<'a, K, T> BTreeIterMut<'a, K, T> {
    fn push_tree(&mut self, tree: &'a mut Tree<K,T>) {
        assert!(self.leaf.len() == 0);
        match tree {
            Tree::L(leaf) => { self.leaf = &mut leaf.0; }
            Tree::NL(nl) => {
                let (tree, iter) = nl.iter_mut();
                self.stack.push(iter);
                self.push_tree(tree);
            }
        }
    }
}

impl<'a, K, T> Iterator for BTreeIterMut<'a, K, T> {
    type Item=(&'a mut K, &'a mut T);
    fn next(&mut self)->Option<Self::Item> {
        if let &mut [(ref mut k, ref mut v), ref mut tail @ ..] = std::mem::take(&mut self.leaf) {
            self.leaf = tail;
            return Some((k,v));
        }
        loop {
            if let Some((&mut (ref mut k, ref mut v), tree)) = self.stack.last_mut()?.next() {
                self.push_tree(tree);
                return Some((k,v));
            }
            self.stack.pop();
        }
    }
}

struct IterNonLeafMut<'a, K, T> {
    v: std::slice::IterMut<'a, (K, T)>,
    c: std::slice::IterMut<'a, Tree<K, T>>,
}

impl<'a, K, T> Iterator for IterNonLeafMut<'a, K, T> {
    type Item=(&'a mut (K, T), &'a mut Tree<K,T>);
    fn next(&mut self)->Option<Self::Item> {
        Some((self.v.next()?, self.c.next()?))
    }
}

impl<K, T> NonLeaf<K, T> {
    fn iter_mut(&mut self)->(&'_ mut Tree<K,T>, IterNonLeafMut<'_, K, T>) {
        let v = self.v.iter_mut();
        let mut c = self.c.iter_mut();
        let tree = c.next().expect("Empty non-leaf");
        (tree, IterNonLeafMut { v,c })
    }
}
1 Like

That is nice! Can it be made DoubleEnded?

Yes, but it might get complicated with the various edge cases around where the two ends of the iterator meet in the middle— Certainly more work than I have time for right now.

The basic idea should transfer, though: You store something like &’a mut [T] or std::slice::IterMut<‘a,T> to represent the parts of the tree that haven’t yet been yielded by the iterator. When you need to yield a new item, take it from those via split_first(), next(), or similar to maintain the invariant that each item can only be yielded once. When the inner iterators are exhausted, you’ve gone through the entire contents of the tree.

1 Like