Newbie: Building Recursive Tree from Materialized Path

Hello,

I'm a Rust beginner working on building a recursive tree structure from a flat list of nodes, and I could really use some help.

The problem I'm facing is that some nodes are missing from the final tree, and I can't figure out why. I'm using ltree_path (a dot-separated path of integers like "623.517.1337") to represent the hierarchy, and my approach is to:

  1. Populate a HashMap where each node is keyed by its int_id.
  2. Iterate through the nodes, find their parent nodes based on ltree_path, and add children to the corresponding parents.
  3. Finally, collect the root nodes to build the tree.

However, after running the code, some nodes are either missing or not properly added to their parents. I suspect I'm doing something wrong when attaching children to parents or perhaps in handling the ltree_path correctly.

simplified code

use std::collections::HashMap;
use std::error::Error;
use serde::Serialize;

// Base trait for recursive items
pub trait RecursiveTreeBase: Serialize + Clone {
    fn new(int_id: i32, id: String, ltree_path: Vec<i32>, parent_id: Option<String>) -> Self;
    fn add_child(&mut self, child: Self);
    fn get_int_id(&self) -> i32;
    fn get_ltree_path(&self) -> &Vec<i32>;
    fn get_children(&self) -> &Vec<Self>;
}

pub fn build_recursive_tree_from_ltree<T: RecursiveTreeBase + std::fmt::Debug>(flat_list: Vec<T>) -> Result<Vec<T>, Box<dyn Error>> {
    let mut id_to_node: HashMap<i32, T> = HashMap::new();

    // Step 1: Populate the HashMap
    for node in flat_list.into_iter() {
        let id = node.get_int_id();
        id_to_node.insert(id, node);
    }

    // Step 2: Attach children to parents
    for node_id in id_to_node.keys().cloned().collect::<Vec<_>>() {
        if let Some(node) = id_to_node.get(&node_id).cloned() {
            let ltree_path = node.get_ltree_path();

            // Only process non-root nodes
            if ltree_path.len() > 1 {
                let parent_int_id = ltree_path[ltree_path.len() - 2];

                // Ensure the parent exists before adding the child
                if let Some(parent) = id_to_node.get_mut(&parent_int_id) {
                    println!("Before adding, parent: {:?} has children: {:?}", parent.get_int_id(), parent.get_children());
                    parent.add_child(node);  // Add the node as a child of its parent
                    println!("After adding, parent: {:?} has children: {:?}", parent.get_int_id(), parent.get_children());
                } else {
                    return Err(format!("Parent with int_id {} not found for node {:?}", parent_int_id, node).into());
                }
            }
        }
    }

    // Step 3: Collect final root nodes (nodes with ltree_path length 1)
    let roots: Vec<T> = id_to_node
        .values()
        .filter(|node| node.get_ltree_path().len() == 1)
        .cloned()
        .collect();

    Ok(roots)
}

#[derive(Debug, Clone, Serialize)]
struct RecursiveLocation {
    int_id: i32,
    ltree_path: Vec<i32>,
    children: Vec<RecursiveLocation>,
}

impl RecursiveTreeBase for RecursiveLocation {
    fn new(int_id: i32, _id: String, ltree_path: Vec<i32>, _parent_id: Option<String>) -> Self {
        Self {
            int_id,
            ltree_path,
            children: Vec::new(),
        }
    }

    fn add_child(&mut self, child: Self) {
        self.children.push(child);
    }

    fn get_int_id(&self) -> i32 {
        self.int_id
    }

    fn get_ltree_path(&self) -> &Vec<i32> {
        &self.ltree_path
    }

    fn get_children(&self) -> &Vec<Self> {
        &self.children
    }
}

fn main() -> Result<(), Box<dyn Error>> {
    // Sample input data
    let flat_list = vec![
        RecursiveLocation::new(623, "root".to_string(), vec![623], None),
        RecursiveLocation::new(517, "child 1".to_string(), vec![623, 517], None),
        RecursiveLocation::new(1337, "child 2".to_string(), vec![623, 517, 1337], None),
        RecursiveLocation::new(74, "child 3".to_string(), vec![623, 517, 74], None),
    ];

    let tree = build_recursive_tree_from_ltree(flat_list)?;
    println!("Tree: {:?}", tree);

    Ok(())
}

Output:

Before adding, parent: 623 has children: []
After adding, parent: 623 has children: [RecursiveLocation { int_id: 517, ltree_path: [623, 517], children: [] }]
Before adding, parent: 517 has children: []
After adding, parent: 517 has children: [RecursiveLocation { int_id: 1337, ltree_path: [623, 517, 1337], children: [] }]
Before adding, parent: 517 has children: [RecursiveLocation { int_id: 1337, ltree_path: [623, 517, 1337], children: [] }]
After adding, parent: 517 has children: [RecursiveLocation { int_id: 1337, ltree_path: [623, 517, 1337], children: [] }, RecursiveLocation { int_id: 74, ltree_path: [623, 517, 74], children: [] }]
Tree: [RecursiveLocation { int_id: 623, ltree_path: [623], children: [RecursiveLocation { int_id: 517, ltree_path: [623, 517], children: [] }] }]

We can see the final tree misses the nodes 74 and 1337.

I am completely new in Rust, so I must be missing some really important concept.

I have tried other versions of the code, but while those would work with a small number of nodes they fail miserably with larger datasets (100 - 2000 nodes), with a mixed up hierarchy (child nodes display as roots…), so I would rather stick to a predictable code :slightly_smiling_face:

Any advice on how I could build my hierarchy from the LTREE / materialized path?

Thanks a lot for your help !

I haven’t looked at your code in detail to see what’s going wrong, but you can do something like this:


use std::collections::{HashMap, HashSet};

#[derive(Default,Debug)]
struct NodeLinks {
    up: Option<i32>,
    down: HashSet<i32>
}

#[derive(Debug)]
pub struct TreeBuilder(HashMap<i32, NodeLinks>);

impl TreeBuilder {
    pub fn new()->Self {
        TreeBuilder(HashMap::default())
    }

    pub fn add_path(&mut self, path: impl IntoIterator<Item=i32>) {
        let mut path=path.into_iter();

        let Some(id) = path.next() else { return };
        let mut node = self.0.entry(id).or_default();
        let mut parent = id;

        for id in path {
            node.down.insert(id);

            node = self.0.entry(id).or_default();
            assert!(node.up.is_none() || node.up == Some(parent), "Multiple parents for {id}");
            node.up = Some(parent);

            parent=id;
        }
    }

    pub fn build<T>(mut self, mut new:impl FnMut(i32, Vec<T>)->T)->Vec<T> {
        let roots:Vec<i32> = self.0.iter().filter(|(_,v)| v.up.is_none()).map(|(&k,_)| k).collect();
        roots.into_iter().map(|id| self.build_one(&mut new, id)).collect()
    }

    fn build_one<T>(&mut self, new: &mut impl FnMut(i32, Vec<T>)->T, id:i32)->T {
        let node = self.0.remove(&id).unwrap();
        let children:Vec<T> = node.down.into_iter().map(|x| self.build_one(&mut *new, x)).collect();
        new(id, children)
    }
}

fn main() {
    let mut builder = TreeBuilder::new();
    builder.add_path([623,517,1337]);
    builder.add_path([623,517,74]);
    dbg!(builder.build(|id, children| format!("{id} [{}]", children.join(", "))));
}
2 Likes

Or, if IDs are allowed to be duplicated at different levels and you are always given the full path:


use std::collections::HashMap;

#[derive(Debug)]
pub struct Node<T> {
    pub payload: Option<T>,
    pub children: HashMap<i32, Node<T>>,
}

impl<T> Default for Node<T> {
    fn default() -> Self {
        Node {
            payload: None,
            children: HashMap::new(),
        }
    }
}

impl<T> Node<T> {
    fn insert(&mut self, path: impl IntoIterator<Item = i32>, data: T) {
        let mut cursor = self;
        for id in path {
            cursor = cursor.children.entry(id).or_default();
        }
        cursor.payload = Some(data);
    }
}

fn main() {
    let mut root = Node::<&'static str>::default();
    root.insert([623], "root");
    root.insert([623, 517], "child 1");
    root.insert([623, 517, 1337], "child 2");
    root.insert([623, 517, 74], "child 3");
    dbg!(root);
}
2 Likes

It you are interested in the exact point of the bug, this is what I found:

here you create a deep clone of the node, after this moment no children will be added to this new clone.

here you add a child to a node that still exists inside id_to_node hashmap.

So at the end of step 2 your id_to_node looks like:

{
  623 => { children: [] }, // clone of #517 made when it had no children,
  517 => { children: [ #1337, #74 ] },
  //                   ^      ^ these are also clones
  ...
}

The outcome of your code as written depends on the iteration order. For it to produce the result you expect it has to be "children first, parent later". However the hashmap does not have any guaranted iteration order (I am kind of suprised it went in the same order as flat_list, it is probably just a coincidence)

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.