Simple binary tree implementation


#1

Hi,

I wan to implement a simple binary search tree. I have the structs, and I have the insert method:

use std::cmp::Ord;
use std::rc::{Rc, Weak};
use std::cell::{Ref, RefCell};

// Sources:
// https://timidger.github.io/designing-a-bi-mutable-directional-tree-safely-in-rust.html
// https://github.com/SimonSapin/rust-forest/blob/master/rctree/lib.rs
// https://xojoc.pw/justcode/rust-binary-search-tree.html
// https://www.reddit.com/r/rust/comments/55ns2m/safe_and_efficient_bidirectional_trees/
// https://ricardomartins.cc/2016/06/08/interior-mutability
// https://gist.github.com/aidanhs/5ac9088ca0f6bdd4a370

struct _Node<T>
    where T: Ord
{
    parent: Option<Weak<RefCell<_Node<T>>>>,
    left: Option<NodeRef<T>>,
    right: Option<NodeRef<T>>,
    value: T,
}
struct NodeRef<T: Ord>(Rc<RefCell<_Node<T>>>);

impl<T: Ord> Clone for NodeRef<T> {
    fn clone(&self) -> NodeRef<T> {
        NodeRef(self.0.clone())
    }
}

impl<T> NodeRef<T>
    where T: Ord
{
    pub fn new(value: T) -> Self {
        NodeRef(Rc::new(RefCell::new(_Node {
            parent: None,
            left: None,
            right: None,
            value: value,
        })))
    }

    fn new_child(&self, value: T) -> Self {
        NodeRef(Rc::new(RefCell::new(_Node {
            parent: Some(Rc::downgrade(&self.0)),
            left: None,
            right: None,
            value: value,
        })))
    }

    fn insert(&self, value: T) -> Self {
        let node = self.0.borrow();

        if node.value == value {
            return self.clone();
        }

        if node.value > value {
            match node.left {
                Some(ref noderef) => {
                    return noderef.insert(value);
                }
                None => {
                    let new_node = self.new_child(value);
                    self.0.borrow_mut().left = Some(new_node.clone());
                    new_node
                }
            }
        } else {
            match node.right {
                Some(ref noderef) => {
                    return noderef.insert(value);
                }
                None => {
                    let new_node = self.new_child(value);
                    self.0.borrow_mut().right = Some(new_node.clone());
                    new_node
                }
            }
        }
    }
}

I want NodeRef<T> to implement aget` method. I first tried with this signature:

fn get(&self, value: T) -> Option<&T> { ... }

Since the node owns its value, I thought I could just return a reference to it. Here is the full implementation (I had to add methods to get left/right node easily):

// return a reference so the left child
fn get_left(&self) -> Option<Self> {
    self.0.borrow().left.as_ref().and_then(|child| Some(child.clone()))
}
// return a reference to the right child
fn get_right(&self) -> Option<Self> {
    self.0.borrow().right.as_ref().and_then(|child| Some(child.clone()))
}

fn get(&self, value: T) -> Option<&T> {
    let mut current_noderef = self.clone();
    loop {
        if current_noderef.0.borrow().value == value {
            // This is wrong!
            return Some(&current_noderef.0.borrow().value)
        } else if current_noderef.0.borrow().value > value {
            match current_noderef.get_right() {
                Some(noderef) => {
                    current_noderef = noderef;
                }
                None => {
                    return None;
                }
            }
        } else {
            match current_noderef.get_left() {
                Some(noderef) => {
                    current_noderef = noderef;
                }
                None => {
                    return None;
                }
            }
        }
    }
}

Unfortunately:

188 |                 return Some(&current_noderef.0.borrow().value)
    |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^ does not live long enough
189 |             } else if current_noderef.0.borrow().value > value {
    |             - temporary value only lives until here

I don’t fully understand why current_noderef.0.borrow() does not live long enough. It should live as long as &current_noderef.0.borrow().value lives, no ? People helped me on IRC, and told me that returning a Ref instead might help here.

Here is the updated version. The only things that change are the signature and the return statement:

fn get(&self, value: T) -> Option<Ref<T>> {
    let mut current_noderef = self.clone();
    loop {
        if current_noderef.0.borrow().value == value {
            return Some(Ref::map(current_noderef.0.borrow(), |node| &node.value));
        } else if current_noderef.cl.0.borrow().value > value {
            match current_noderef.get_right() {
                Some(noderef) => {
                    current_noderef = noderef;
                }
                None => {
                    return None;
                }
            }
        } else {
            match current_noderef.get_left() {
                Some(noderef) => {
                    current_noderef = noderef;
                }
                None => {
                    return None;
                }
            }
        }
    }
}

Unfortunately this also fails because noderef.0 lives only during one iteration of the loop:

177 |                 return Some(Ref::map(current_noderef.0.borrow(), |node| &node.value));
    |                                      ^^^^^^^^^^^^^^^^^ does not live long enough
...
198 |     }
    |     - borrowed value only lives until here

I’ve tried many variations of this example with no success. Could someone come up with a working version of me get method? Or am I trying to do something fundamentally wrong here?

I thought I mostly understood ownership and mutability, but my first experience with Rc and RefCell has been pretty frustrating :worried:


#2

Borrows of a value in a RefCell<T> must be reference-counted at runtime, to protect against mutation during a shared borrow. This is done by requiring a guard object called a Ref<T> to be alive during the borrow. When the Ref<T> is dropped, its destructor will decrement the borrow count. This is why your NodeRef can’t safely return an &T from a method, because nothing would prevent someone else from mutating the contents while that &T was still live, leading to undefined behavior.

This also means that it will be tricky to implement any non-recursive traversal function that works by borrowing, because you will need a whole stack of Refs stay alive while you traverse the tree. (If you implement your traversals recursively, the Refs can just live on the call stack; this is how your insert function works.) And it’s hard to return anything useful from such a function because each Ref in the stack needs to outlive any returned reference.

One alternative is to clone/copy the data and return it by value (T rather than Ref<T> or &T), or to return a reference-counted pointer rather than a borrowed pointer. (In your case, you could return Rc<RefCell<_Node<T>>> or some wrapper type around this.) Another alternative is to write a recursive traversal function that takes a closure which will be run on the matching element.


#3

I think using Rc and RefCell here will make things more complicated than they need to be. If this is indeed a binary tree, then each node has exactly one owner (its parent), so Rc does not fit here semantically.

If you can avoid storing parent pointers, then your node can use only options and boxes:

struct Node<T> {
  value: T
  children: [Option<Box<Node<T>>>; 2]
}

If you do need parent pointers, than I’d suggest going unsafe, storing * mut Node<T> and implementing a custom Drop for a node. You should be able to provide a safe Tree abstraction around unsafe Nodes.

The third approach is to store all the nodes in a single Vec, and use indices instead of pointers (though this won’t work if you would be doing tree rotations, because you can’t safely mutably borrow several elements from the vector).