Rust a datastructure of a Tree and TreeNode

Hi,

I translated a simple tree structure from Python and C++ to Rust. The code is straightforward, but due to Rust’s safety requirements, it became quite convoluted :sob:. The API had to be hidden in TreeNodeInner. and json serde serialization required two new structs.

I’m sure many people have implemented tree data structures in Rust. Could someone give advice on how to simplify this implementation while keeping the same API?

Usage:

        let root = TreeNode::new("project_root");
        let folder1 = TreeNode::new("src");
        let folder2 = TreeNode::new("docs");
        let file1 = TreeNode::new("main.py");
        let file2 = TreeNode::new("README.md");

        root.add(&folder1);
        root.add(&folder2);
        folder1.add(&file1);
        folder2.add(&file2);

Rust implementation:

CPP implementation:

Python implementation:

Don't have parent pointers. For children don't use reference counting because there is only one reference, just own the children outright.

1 Like

But in real scenarios, you most often need to know about your root and parents.

Not necessarily. As you pass through from the parent to the child, you can just remember where you came from. Often there is no need to store a pointer back in the child node.

But you can get a better answer if you describe what you want to do exactly.

1 Like

Welcome to the wonderful world of self-reference structures:)

If you were ok with a read-only tree you could use Rc::new_cyclic, but if you want to be able to add children dynamically, you need an interior mutability.

Simple and dirty example.

struct TreeNode {
    id: u8,
    children: Vec<Rc<RefCell<TreeNode>>>,
    parent: Weak<RefCell<TreeNode>>,
}

fn add(parent: &Rc<RefCell<TreeNode>>, id: u8) {
    let parent_weak = Rc::downgrade(&parent).clone();
    parent
        .borrow_mut()
        .children
        .push(Rc::new(RefCell::new(TreeNode {
            id,
            children: vec![],
            parent: parent_weak,
        })));
}

fn main() {
    let root = Rc::new(RefCell::new(TreeNode {
        id: 1,
        children: vec![],
        parent: Weak::new(),
    }));

    add(&root, 1);
    add(&root, 2);
    add(&root.borrow().children[0], 3);

    for x in &root.borrow().children {
        let child = x.borrow();
        let parent_id = if let Some(ref p) = child.parent.upgrade() {
            p.borrow().id
        } else {
            0 //
        };
        println!("Id {} has parent {}", child.id, parent_id);
    }
}

PS: The scenario when your parent owns children (i.e has strong references in this case) but children only have weak references to parent (not to create a reference cycle which leads to a memory leak) was once known as "strong weak dance" in ObjC world.

1 Like

Then I guess my implementation, it is what it is. You need to do some extra work internally so that API is clean but still being able to add parent of a new tree node.

Can you recommend any course to look deeper into this?

There is a chapter about it in the "Effective Rust" book (the whole book is worth reading!)

1 Like

You don't normally do all that extra work, though. Normally, in Rust, you'd reduce the amount of shared ownership and cyclic graph structures to the minimum necessary to get the job done.

Minimally, none: playground

#[derive(Clone)]
pub struct TreeNode {
    pub name: String,
    children: Vec<TreeNode>,
}

impl TreeNode {
    pub fn new(name: &str) -> Self {
        TreeNode {
            name: name.to_string(),
            children: vec![],
        }
    }

    pub fn add(&mut self, other: TreeNode) {
        self.children.push(other);
    }
}

Avoiding aliasing where possible has significant performance and simplicity benefits. You mentioned the Rust code is more complex than the C++, but the C++ code for add() is not just one line of code.

It also has safety benefits; you could implement children and descendants safely for this type, whereas the C++ versions are subject to invalidation if you hold the vectors across graph changes.

It also reduces the need for documentation: the C++ header does not say, for example, what happens if you call parent.add(child) where child is a null pointer, or child is already a child of a different node, or child == parent or you're otherwise making a cycle. It should if that is a way callers could trigger UB (or infinite loops and memory leaks). We don't have to in the simplified Rust version because the compiler rules out all of those special cases!

If you really need all the complications, you can still do it in safe Rust, but it's not particularly pretty or idiomatic.

2 Likes

if your algoritm requires back pointers, then it's not strictly a tree structure, you are actually using some form of graph data structure, which are notoriously hard problem to solve in rust.

NOTE: in the examples, when I say trees, I always refer to binary trees, but the principle applies to arbitrary n-ary trees in general.

if you come from functional language background (haskell, for example), it should be clear that a tree is a very simple yet fundamental recursive data structure. it is usually defined in one of two styles (mathematically equivalent, but are represented slightly differently in computer):

  • first style encodes the different "shapes" in the Node type, while the Tree type is a wrapper for the root Node.

    in other words, a Tree always has a root node, but Nodes can be empty, something like this:

    data Node a = Nil | Val a (Tree a) (Tree a)
    data Tree a = Root (Node a)
    
  • second style encodes the "empty" state into the Tree type, i.e. a Node is never emtpy, it always contains data, but the root node for a Tree is optional, something like this:

    data Node a = Node a (Tree a) (Tree a)
    data Tree a = Empty | Root (Node a)
    

both styles can be translated into rust in a straightforward way, thanks to the simple recursive structure and single ownership model [1]. here's an example of a typical tree in rust:

/// a node is never empty, it consists of a value and subtrees
struct Node<T> {
    value: T,
    left: Tree<T>,
    right: Tree<T>,
}

/// a tree can either be empty or contain a root node,
/// equivalent to:
///     type Tree<T> = Option<Box<Node<T>>>;
enum Tree<T> {
    Empty,
    Root(Box<Node<T>>),
}

  1. just remember to add an explicit Box, like every recursive data types in rust ↩︎

1 Like

The most straightforward approach to graphs (which is what you have if you have back pointers) is to use indices:

// just for clarity
#[derive(Copy, Clone)] // probably lots more...
struct NodeId(usize);

struct Graph<T> {
  nodes: Vec<T>,
}

impl<T> Graph<T> {
  pub fn new() -> Self {
    Self { nodes: vec![] }
  }

  // no remove or it gets way more complicated!
  pub fn add(&mut self, value: T) -> NodeId {
    let result = NodeId(self.nodes.len());
    self.nodes.push(value);
    result
  }

  pub fn get(&self, id: NodeId) -> Option<&T> {
    self.nodes.get(id)
  }

  // ...
}

struct TreeNode<T> {
  parent: Option<NodeId>,
  value: T,
  children: Vec<NodeId>,
}

struct Tree<T> {
  graph: Graph<TreeNode<T>>,
  root: NodeId,
}

impl<T> Tree<T> {
  pub fn new(root_value: T) -> Self {
    let graph = Graph::new();
    graph.add(TreeNode {
      parent: None,
      value: root_value,
      children: vec![],
    });
    Self { graph, root }
  }

  pub fn root(&self) -> NodeId {
    self.root
  }

  pub fn get(id: NodeId) -> Option<&TreeNode<T>> {
    self.graph.get(id)
  }

  // ...
}

You should certainly be looking to use something purpose built here like petgraph if you go this route, it handles a lot of book-keeping details for you as well as providing a rich set of algorithms.

The easier option is as mentioned to just not have the back pointers in the first place:

struct Tree<T> {
    value: T,
    children: Vec<Tree<T>>,
}

impl<T> Tree<T> {
    // ...

    pub fn visit(&self, mut visitor: impl FnMut(&T, &[(&T, usize)])) {
        let mut stack = vec![(self, 0)];
        let mut path = vec![];
        visitor(&self.value, &path);
        while let Some((node, index)) = stack.pop() {
            if let Some(child) = node.children.get(index) {
                stack.push((node, index + 1));
                stack.push((child, 0));
                path.push((&node.value, index));
                visitor(&child.value, &path)
            } else {
                path.pop();
            }
        }
        assert!(path.is_empty());
    }
}

fn main() {
    let tree = Tree {
        value: "A",
        children: vec![
            Tree {
                value: "B",
                children: vec![],
            },
            Tree {
                value: "C",
                children: vec![Tree {
                    value: "D",
                    children: vec![],
                }],
            },
            Tree {
                value: "E",
                children: vec![],
            },
        ],
    };

    tree.visit(|value, path| {
        println!("{value}: {path:?}");
    });
}
A: []
B: [("A", 0)]
C: [("A", 1)]
D: [("A", 1), ("C", 0)]
E: [("A", 2)]

Hope you remembered your algorithms class!

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.