Trees and Nodes in Rust

I'm trying to design a tree container that will respect the following API:

pub type NodePtr<T> = Rc<RefCell<Node<T>>>;
pub type TreePtr<T> = Rc<RefCell<Tree<T>>>;

let tree: TreePtr<String> = Tree::new();
    
let node_A: NodePtr<String> = Node::new("A".to_owned())
                                   .add_to(tree.clone())
                                   .unwrap();
    
let node_B: NodePtr<String> = Node::new("B".to_owned())
                                   .with_parent(node_A.clone())
                                   .with_distance(1.2)
                                   .add_to(tree.clone())
                                   .unwrap();

More precisely:

  1. There should be a Tree struct to provide access to a tree as a whole.
  2. There should be a Node struct to provide access to each individual node.
  3. Once a node is create via Node::new(), it should be immediately registered with some tree, thus, there is an add_to method at the end of builder chain.

I expect both top-down and bottom-up types of construction, thus, the API should be flexible, allwoing to both set a parent and add a child:

let node_C: NodePtr<String> = Node::new("C".to_owned())
                                   .add_to(tree.clone())
                                   .unwrap();

let node_D: NodePtr<String> = Node::new("D".to_owned())
                                   .add_to(tree.clone())
                                   .unwrap();
 
node_A.add_child(node_C.clone());
node_D.set_parent(node_A.clone());

The strategy that I chose consists in the following.

  1. When registering a node in a tree, I generate a node ID. Using this node ID, I then keep track of children and parents. This reduces the number of borrowing issues a lot.
pub struct Tree<T> where T: Hash + Clone + PartialEq + Eq + ToString
{
    root       : Option<usize>,
    nodes      : HashMap<usize, NodePtr<T>>,
    parents    : HashMap<usize, usize>,
    children   : HashMap<usize, Vec<usize>>,
    name_to_id : HashMap<T, usize>,
    curr_id    : usize,
}
  1. The structure of the tree is being stored inside Tree. Thus, what remains for the Node are node related data and a link (reference) to a tree which it belongs to.
pub struct Node<T> where T: Hash + Clone + PartialEq + Eq + ToString
{
    id       : usize,
    name     : T,
    tree     : TreePtr<T>,
    distance : Option<f64>,
}
  1. Thus, each time we query Node to interact with other nodes, we're redirecting our query to the corresponding tree through self.tree. For example, when setting a parent or adding a child:
pub fn set_parent(&mut self, parent: NodePtr<T>) -> Result<(),String>
{
    if parent.borrow().tree.as_ptr() != self.tree.as_ptr() 
    {
        return Err(String::from("tree is different from parent tree"));
    }

    self.tree.borrow_mut().add_parent_child_edge(&parent.borrow(), &self)
}

pub fn add_child(&mut self, child: NodePtr<T>) -> Result<(),String>
{
    if child.borrow().tree.as_ptr() != self.tree.as_ptr() 
    {
        return Err(String::from("tree is different from parent tree"));
    }
        
    self.tree.borrow_mut().add_parent_child_edge(&self, &child.borrow())    
}

However, when trying to return an iterator over child node via node.children()

pub fn children(&self) -> ChildrenIterator<T>
{
    self.tree.borrow().children(self.id)
}

I get an error returns a reference to data owned by the current function.
The iterator struct has some reference to the fields of the Tree struct, namely, to the HashMap of nodes and to Vec of node ids that are children of the current node:

pub struct ChildrenIterator<'a, T> where T: Hash + Clone + PartialEq + Eq + ToString 
{
    index    : usize,
    node_ids : &'a Vec<usize>,
    nodes    : &'a HashMap<usize, NodePtr<T>>,
}

impl<'a, T> Iterator for ChildrenIterator<'a, T> where T: Hash + Clone + PartialEq + Eq + ToString 
{
    type Item = NodePtr<T>;

    fn next(&mut self) -> Option<Self::Item>
    {
        if self.index < self.node_ids.len()
        {
            let node_id = self.node_ids[self.index];
            let node = self.nodes.get(&node_id).unwrap().clone();
            self.index += 1;
            Some(node)
        }
        else { None }
    }
}

The full code is here: Rust Playground

My questions are:

  1. How to fix the issue with the iterator?
  2. Using NodePtr (i.e. Rc<RefCell<Node>>) almost everywhere simplifies borrowing, however, adds overhead. Should I change the strategy to storing nodes in a HashMap directly, i.e., HashMap<usize, Node<T>> and using &Node everywhere instead? I can't foresee all the Rust-specific implications of using either of these strategies, thus, I would like to enquire on the pros and cons associated with each of them.

The Ref<'_, _> you get from RefCell::borrow is analogous to a RwLock read guard. Once you drop it, you lose access to the guarded resource. Moreover, you can't safely carry both around in the same struct, as that would be a form of self-referencial struct that Rust does not support.

So you can either

  • Not return an iterator, but return some type C where &C can be turned into an iterator
    • But this is rather unergonomic
  • Store the Ref<'_, Tree<T>> and look up the fields you need every time
  • Use a visitor/callaback pattern or some other non-iterator approach instead
    • fn visit<F: FnMut(&Node<T>)>(&self, visitor: F)

Here's a quickly thrown together version of storing the Ref<'_, Tree<T>>.

pub struct ChildrenIteratorTwo<'a, T>
where
    T: Hash + Clone + PartialEq + Eq + ToString,
{
    index: usize,
    node_id: usize,
    tree: Ref<'a, Tree<T>>,
}

impl<'a, T> Iterator for ChildrenIteratorTwo<'a, T>
where
    T: Hash + Clone + PartialEq + Eq + ToString,
{
    type Item = NodePtr<T>;
    fn next(&mut self) -> Option<Self::Item> {
        let this = &*self.tree;
        let node_ids = this.children.get(&self.node_id).unwrap();
        if self.index < node_ids.len() {
            let node_id = node_ids[self.index];
            let node = this.nodes.get(&node_id).unwrap().clone();
            self.index += 1;
            Some(node)
        } else {
            None
        }
    }
}

I haven't absorbed your code or desired use cases enough to give feedback on overall design, but did notice you need to rethink how nodes are associated with trees, and how that interacts with shared ownership (Rc). You're allowing adding a node that already belongs to some tree, and overwriting the node's id when you do so.

    pub fn add_node(&mut self, node: NodePtr<T>) -> Result<NodePtr<T>, String> {
        // ...
        let node_id = self.curr_id;
        self.curr_id += 1; // checked add?
        node.borrow_mut().id = node_id;

This is going to mess up the original tree.

3 Likes
  1. Tree or node should NEVER be pub Rcs.

  2. Tree = Node + Ownership.

  3. If you want to distinguish between Tree and Node, let your users create Trees not Nodes. The Nodes references can be obtained via Tree’s API.

  4. Or you can provided a unified TreeNode to deal with node manipulations and (shared) ownership.

  5. trees for your information.

Thanks. The trick with iterator works. Well, it's not the fastest way due to lots of indirections. Any suggestion on what would be the fastest one? (It may require to change some logic, I understand... just want to see what options I have)

Thanks also for noting the add_node issue. As for now, I'm playing with the API, this kind of issues are expected. I just want to test the limits of all possible approaches, not particularly concerned with all the checks.

Don't even bother. If you are building a tree using pointers, you already messed up cache locality. If you want fast access, you should use a flat array and identify nodes by their index. If you want the convenience of direct references, the solutions presented above are just fine.

1 & 2. Could you please clarify on these?
3. That's what I'm doing, no?
4. I need a separate Tree object to hash/cache certain data, e.g., node names, for quick look up. I will also need a distance calculator, accessible both through Tree and Node, e.g.,

let node_A = tree.get_node_by_id("node-A").unwrap();
let node_B = tree.get_node_by_id("node-B").unwrap();

let d1 = tree.distance_between(node_A.clone(), node_B.clone()); // 1st approach
let d2 = node_A.distance_to(node_B.clone());                    // 2nd approach

I see no way except for using Rc's everywhere.

Hm... should I then stop using node_id at all? Just use NodePtr everywhere?

If you semantically need it (eg. because you often look up nodes by their ID), then of course it's fine.

Actually, I don't really need IDs. Node names serve as unique identifiers. I redesigned Tree and Node such that RefCell wraps individual fields, not the whole node or tree.

pub struct Tree
{
    root  : RefCell<Option<Rc<Node>>>,
    nodes : RefCell<HashMap<String,Rc<Node>>>,
}

pub struct Node
{
    name     : RefCell<String>,
    parent   : RefCell<Option<Rc<Node>>>,
    children : RefCell<Vec<Rc<Node>>>,
    tree     : Rc<Tree>
}

This way all the nasty .borrow() and .borrow_mut() stuff happens in the implementation, not in the main code:

fn main()
{
    let tree = Tree::new();
    let node_A = Node::new("A", tree.clone());
    let node_B = Node::new("B", tree.clone());
    let node_C = Node::new("C", tree.clone());
    
    node_A.add_child(node_B.clone());
    node_A.add_child(node_C.clone() );

    node_B.set_name("BB".to_owned());
    node_C.set_name("CCC".to_owned());

    for node in &node_A.children() 
    {
        node.set_name(format!("{}!", node.get_name()));
        println!("{}", node.get_name());
    }

    let node = tree.find_node("A").unwrap();
    println!("{}", node.get_name());

}

The full code is here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e0e9eb1f87d265391353715bea4332c4

The question is whether or not it's a good practice to use something like just and Rc pointer to wrap Tree and Node with individual fields packaged into RefCells? Are there any possible drawbacks?

I decided to try this approach also (see full code below)

pub struct ChildrenIterator<'a>
{
    r : Ref<'a, Vec<Rc<Node>>>,
}

impl<'a,'b:'a> IntoIterator for &'b ChildrenIterator<'a>
{
    type IntoIter = Iter<'a, Rc<Node>>;
    type Item = &'a Rc<Node>;
    
    fn into_iter(self) -> Self::IntoIter
    {
        self.r.iter()
    }
}

It's just a wrapper over an iterator. However, now I have a question. If I want to iterate over all descendants (i.e., children of children of children etc) using pre- / in- / post- order type of iteration, would I be able to use lazy iteration in this case without coying parts of the tree?

Now that you've arranged the nodes hierarchically with a RefCell at every level, you're going to hit the same barriers to iteration as covered in this other recent thread.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.