Binary Search Tree Node Removal without unsafe code

So I have this implementation of BST. I have an unsafe version of node removal but I want the equivalent safe code. However, I am not being able to do so and the compiler is complaining because I have a mutable reference to self and trying to assign another value to it. Here is the code. Looking for help to do this without unsafe block and raw pointers.

use std::cmp::Ordering;

#[derive(Debug)]
struct Node<T: Ord> {
    key: T,
    left: Tree<T>,
    right: Tree<T>,
}

#[derive(Debug)]
struct Tree<T: Ord>(Option<Box<Node<T>>>);

#[derive(Debug)]
pub struct BinarySearchTree<T: Ord> {
    root: Tree<T>,
}

impl<T: Ord> Node<T> {
    fn new(value: T) -> Self {
        Node {
            key: value,
            left: Tree(None),
            right: Tree(None),
        }
    }
}

impl<T: Ord> Tree<T> {
    fn add(&mut self, value: T) {
        let mut current = self;

        while let Some(ref mut node) = current.0 {
            match node.key.cmp(&value) {
                Ordering::Less => current = &mut node.right,
                Ordering::Greater => current = &mut node.left,
                Ordering::Equal => current = &mut node.right,
            }
        }

        current.0 = Some(Box::new(Node::new(value)));
    }

    fn successor(&self, value: &T) -> Option<&T> {
        let mut current = self.0.as_ref();
        let mut successor = None;
        while current.is_some() {
            let node = current.unwrap();
            if *value < node.key {
                successor = current;
                current = node.left.0.as_ref();
            } else {
                current = node.right.0.as_ref();
            }
        }

        successor.map(|node| &node.key)
    }

    fn extract_min(&mut self) -> Option<T> {
        let mut node = None;

        if self.0.is_some() {
            let mut current = self;

            while current.0.as_ref().unwrap().left.0.is_some() {
                current = &mut current.0.as_mut().unwrap().left;
            }

            let temp = current.0.take().unwrap();
            node = Some(temp.key);
            current.0 = temp.right.0;
        }

        node
    }

    fn extract_max(&mut self) -> Option<T> {
        let mut node = None;

        if self.0.is_some() {
            let mut current = self;

            while current.0.as_ref().unwrap().right.0.is_some() {
                current = &mut current.0.as_mut().unwrap().right;
            }

            let temp = current.0.take().unwrap();
            node = Some(temp.key);
            current.0 = temp.left.0;
        }

        node
    }

    fn remove(&mut self, value: &T) {
        let mut current: *mut Tree<T> = self;

        unsafe {
            while let Some(ref mut node) = (*current).0 {
                match node.key.cmp(value) {
                    Ordering::Less => current = &mut node.right,
                    Ordering::Greater => current = &mut node.left,
                    Ordering::Equal => match (node.left.0.as_mut(), node.right.0.as_mut()) {
                        (None, None) => (*current).0 = None,
                        (Some(_), None) => (*current).0 = node.left.0.take(),
                        (None, Some(_)) => (*current).0 = node.right.0.take(),
                        (Some(_), Some(_)) => {
                            (*current).0.as_mut().unwrap().key = node.right.extract_min().unwrap();
                        }
                    },
                }
            }
        }
    }
}

impl<T: Ord> BinarySearchTree<T> {
    pub fn new() -> Self {
        BinarySearchTree { root: Tree(None) }
    }
    pub fn add(&mut self, value: T) {
        self.root.add(value);
    }
    pub fn remove(&mut self, value: &T) {
        self.root.remove(value);
    }
    pub fn successor(&self, value: &T) -> Option<&T> {
        self.root.successor(value)
    }
}
1 Like

I think this is one of the patterns (a loop trying to follow sequential mutable references) that just can't be expressed with Rust borrows. A recursive version does work:

    fn remove(&mut self, value: &T) {
        if let Some(ref mut node) = self.0 {
            match node.key.cmp(value) {
                Ordering::Less => node.right.remove(value),
                Ordering::Greater => node.left.remove(value),
                Ordering::Equal => match (node.left.0.as_mut(), node.right.0.as_mut()) {
                    (None, None) => self.0 = None,
                    (Some(_), None) => self.0 = node.left.0.take(),
                    (None, Some(_)) => self.0 = node.right.0.take(),
                    (Some(_), Some(_)) => {
                        self.0.as_mut().unwrap().key = node.right.extract_min().unwrap();
                    }
                },
            }
        }
    }
3 Likes

a loop trying to follow sequential mutable references

This line explains a lot. I wasn't understanding why the compiler was complaining about multiple mutable references and this clearly explains why. The while loop was creating multiple mutable references.

So it's either recursion or raw pointers for cases like this where the underlying data structure is recursive itself?

1 Like

I'm afraid so. I checked the recursive version I just wrote for i32 and String in release mode and it's using tail call optimization in both cases so there are cases where you wouldn't need to worry about stack overflows, although I suppose it would be nice if you could have this guaranteed.

I'm not convinced this can't be done in safe code. Notably, there are no RefCells/Mutexes in the way, so you don't have to keep a stack of guards around, and the links only point one way, so there should not be problems with aliasing. When this question was posted to Stack Overflow I took a brief stab at it but I wasn't able to come up with anything constructive; however, that doesn't mean it can't be done.

3 Likes

So first of all, your code isn’t really the problem, this is more of a shortcoming of the borrow checker.

Taking this code

fn remove(&mut self, value: &T) {
    let mut current = self;

    while let Some(ref mut node) = current.0 {
        match node.key.cmp(value) {
            Ordering::Less => current = &mut node.right,
            Ordering::Greater => current = &mut node.left,
            Ordering::Equal => match (node.left.0.as_mut(), node.right.0.as_mut()) {
                (None, None) => current.0 = None,
                (Some(_), None) => current.0 = node.left.0.take(),
                (None, Some(_)) => current.0 = node.right.0.take(),
                (Some(_), Some(_)) => {
                    current.0.as_mut().unwrap().key = node.right.extract_min().unwrap();
                }
            },
        }
    }
}

one can see that it compiles on nightly with -Zpolonius which is a very good indicator, that the only reason it doesn’t compile is that the current borrow checker is a bit too simplistic.

One nontrivial thing happening here are the current = &mut node.left/right assignments where node in turn was derived from current. E.g. try abstracting this kind of access into a function, you’d want to mutate current itself, so pass a &mut &mut Tree<T>. But then when you assign a derived &mut Tree<T> back to it, what do the lifetimes look like? From &'a mut &'b mut Tree<T>, you can only get a &'a mut Tree<T>. You need to assign a &'b mut Tree<T> though. So the constraint would be 'a == 'b, which is no good. (Borrowing current for its entire lifetime.)

The borrow checker can handle more sophisticated things than just access patterns that can be abstracted into separate functions, but there’s a limit. And the prototype Polonius borrowchecker can do a bit more sophisticated things even.

Anyways, talking about workarounds that work with stable rust. The way around the problem that &'a mut &'b mut Tree<T> only gives &'a mut Tree<T> is to use &'a mut Option<&'b mut Tree<T>>. Then you can .take() a &'b mut Tree<T>. This hints at a solution: Make current an Option<&mut Tree<T>>. So here’s the first step:

fn remove(&mut self, value: &T) {
    let mut current = Some(self);

    while let Some(current_) = current.take() {
        if let Some(ref mut node) = current_.0 {
            match node.key.cmp(value) {
                Ordering::Less => current = Some(&mut node.right),
                Ordering::Greater => current = Some(&mut node.left),
                Ordering::Equal => match (node.left.0.as_mut(), node.right.0.as_mut()) {
                    (None, None) => current_.0 = None,
                    (Some(_), None) => current_.0 = node.left.0.take(),
                    (None, Some(_)) => current_.0 = node.right.0.take(),
                    (Some(_), Some(_)) => {
                        current_.0.as_mut().unwrap().key = node.right.extract_min().unwrap();
                    }
                }
            }
        }
    }
}

this still doesn’t compile, but some of the hardest to fix errors are gone.

By releasing node before assigning to current_.0, we can make more error messages go away:

fn remove(&mut self, value: &T) {
    let mut current = Some(self);

    enum Action<A,B> {
        SetCurrent(A),
        SetCurrentKey(B),
    }
    use Action::*;
    while let Some(current_) = current.take() {
        let mut action = None; 
        if let Some(ref mut node) = current_.0 {
            match node.key.cmp(value) {
                Ordering::Less => current = Some(&mut node.right),
                Ordering::Greater => current = Some(&mut node.left),
                Ordering::Equal => action = Some(match (node.left.0.as_mut(), node.right.0.as_mut()) {
                    (None, None) => SetCurrent(None),
                    (Some(_), None) => SetCurrent(node.left.0.take()),
                    (None, Some(_)) => SetCurrent(node.right.0.take()),
                    (Some(_), Some(_)) => {
                        SetCurrentKey(node.right.extract_min().unwrap())
                    }
                }),
            }
        }
        if let Some(a) = action {
            match a {
                SetCurrent(x) => {
                    current_.0 = x;
                }
                SetCurrentKey(x) => {
                    current_.0.as_mut().unwrap().key = x;
                }
            }
            current = Some(current_);
        }
    }
}

So now we have (well I wished this would already compile, but anyways...) the classical problem with the current borrow-checker, where one path (with Less or Greater) has a mutable reference (current_ through node.left/right) returned/escaping a block while the other path still accesses current_. Okay maybe I wouldn’t expect this version as written above to compile, but with

Ordering::Less => {current = Some(&mut node.right); continue},
Ordering::Greater => {current = Some(&mut node.left); continue},

it’s sound (and Polonius is still happy).

Okay.. I thought this might be fixable somehow... seems like I’m a bit stuck here. Let’s back-track a bit and re-consider.. trying do the cmp call before _.take()ing from current seems like a decent idea. Ah, look, this does the trick:

fn remove(&mut self, value: &T) {
    let mut current = Some(self);

    while current.is_some() && current.as_ref().unwrap().0.is_some() {
        if let Some(&mut ref mut current_) = current {
            if let Some(ref mut node) = current_.0 {
                match node.key.cmp(value) {
                    Ordering::Less => current = Some(&mut current.take().unwrap().0.as_mut().unwrap().right),
                    Ordering::Greater => current = Some(&mut current.take().unwrap().0.as_mut().unwrap().left),
                    Ordering::Equal => match (node.left.0.as_mut(), node.right.0.as_mut()) {
                        (None, None) => current_.0 = None,
                        (Some(_), None) => current_.0 = node.left.0.take(),
                        (None, Some(_)) => current_.0 = node.right.0.take(),
                        (Some(_), Some(_)) => {
                            current_.0.as_mut().unwrap().key = node.right.extract_min().unwrap();
                        }
                    }
                }
            }
        }
    }
}

Disclaimer: All code is untested. You should’ve included a testcase perhaps, anyways, I was to lazy to write one myself.


Edit: I’m noticing that some of the intermediate versions of the code have slightly changed or even completely broken termination conditions. Not going to fix those, but I guess the final version (that one I did fix above, it used to have an error, too) perhaps looks nicer like this:

fn remove(&mut self, value: &T) {
    let mut current = Some(self);

    loop {
        let current_ = current.as_deref_mut().unwrap();
        if let Some(ref mut node) = current_.0 {
            match node.key.cmp(value) {
                Ordering::Less => current = Some(&mut current.take().unwrap().0.as_mut().unwrap().right),
                Ordering::Greater => current = Some(&mut current.take().unwrap().0.as_mut().unwrap().left),
                Ordering::Equal => match (node.left.0.as_mut(), node.right.0.as_mut()) {
                    (None, None) => current_.0 = None,
                    (Some(_), None) => current_.0 = node.left.0.take(),
                    (None, Some(_)) => current_.0 = node.right.0.take(),
                    (Some(_), Some(_)) => {
                        current_.0.as_mut().unwrap().key = node.right.extract_min().unwrap();
                    }
                }
            }
        } else { break }
    }
}
6 Likes

You don't need Option, the interesting thing that @steffahn introduced is reborrowing. So here's a version of that:


    fn remove(&mut self, value: &T) {
        let mut current = self;
    
        loop {
            let current_ = &mut*current;
            if let Some(ref mut node) = current_.0 {
                match node.key.cmp(value) {
                    Ordering::Less => current = &mut current.0.as_mut().unwrap().right,
                    Ordering::Greater => current = &mut current.0.as_mut().unwrap().left,
                    Ordering::Equal => match (node.left.0.as_mut(), node.right.0.as_mut()) {
                        (None, None) => current_.0 = None,
                        (Some(_), None) => current_.0 = node.left.0.take(),
                        (None, Some(_)) => current_.0 = node.right.0.take(),
                        (Some(_), Some(_)) => {
                            current_.0.as_mut().unwrap().key = node.right.extract_min().unwrap();
                        }
                    }
                }
            } else { break }
        }
    }

I think this certainly didn't use to work in Rust 1.0.0 but maybe this is one of the many improvements of non-lexical lifetimes. :slight_smile:

2 Likes

Ah, nice find. I didn’t feel like continuing with re-simplification after already spending quite some time on this.

In fact this works, too

fn remove(&mut self, value: &T) {
    let mut current = self;

    while let Some(ref mut node) = current.0 {
        match node.key.cmp(value) {
            Ordering::Less => current = &mut current.0.as_mut().unwrap().right,
            Ordering::Greater => current = &mut current.0.as_mut().unwrap().left,
            Ordering::Equal => match (node.left.0.as_mut(), node.right.0.as_mut()) {
                (None, None) => current.0 = None,
                (Some(_), None) => current.0 = node.left.0.take(),
                (None, Some(_)) => current.0 = node.right.0.take(),
                (Some(_), Some(_)) => {
                    current.0.as_mut().unwrap().key = node.right.extract_min().unwrap();
                }
            }
        }
    }
}

so the most significant change was re-tracing the path to node on the Less and Greater branches (using unwrap, etc). This is probably zero overhead with compiler optimizations.

Quick check on Godbolt’s compiler explorer reveals that this, indeed, only works since NLL.

3 Likes

Thank you so much for this long explanation. I really appreciate this one.

So to solve problems like this where self is borrowed mutably for a recursive data structure and I want to manipulate/change the underlying data associated with the struct, reborrowing it is the way to go when using a iterative process.

I don’t really know if I can see a general pattern in this one example already. Also, as I said, coming up with a working version took me quite some time, probably more than an hour. The main takeaway is probably that many things are possible with some workarounds in safe Rust. Helps with motivation for solving hard cases. And trying out polonius on a subtle borrowchk error messages in cases that seem potentially sound is always a good idea.

Even though it wasn’t applicable in this case, using Option can help tremendously with the borrow-checker. And re-borrowing is definitely a concept one should be familiar with in order to understand Rust well.

In this case a key point was the realization that the assignment to current kept node and hence the previous value of current borrowed for longer than the scope of the entire match block, and that that stood in conflict with the use of current and/or node (I don’t even know if both were problematic; perhaps only node was a problem, who knows) in the Equals branch. So it’s good to know that there are cases when control-flow splits and different paths need to hold a borrow for different lenghts that the borrowchecker doesn’t like. In this case this leads to the idea to shorten the lifetime of node by re-creating it inside of the Less and Greater paths. Once I finally saw it, at least it did remind me somewhat of examples I had seen before. Maybe one of the examples here (An alias-based formulation of the borrow checker <- blog post about polonius) [not 100% sure it wasn’t somewhere else, I’m not re-reading this blog post right now :wink: ].

I mean, of course there might also be a general pattern of how to successfully do specifically (mutable) traversal of a datastructure in this code example. I’m not going to thing about it really right now, feel free to explore this if you like ^^

1 Like

Maybe your implementation constraint does not allow this, but if possible, I think its better to use something like Option<Rc<RefCell<...>>>. This makes it a whole lot easier to deal with recursive structures.

I think the use of Rc and Refcell would be overkill and unnecessary as I don't need multiple owners to the same memory location. Each node needs to be referenced by only one node so Box is more than sufficient enough for this structure.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.