Best way to implement tree that can be traversed up and down from any node?

What is the best way to implement a tree that can be traversed up and down from any node?

Specifically, I'm writing a name resolution pass for a compiler, and I'm tracking scopes. I currently have a Scope struct like this:

pub struct Scope {
    parent: Option<Box<Scope>>,
    decls: HashMap<String, NodeId>,
}

but there are two issues with it as it is:

  1. The Box<Scope> for parent means that each child node will have a copy of the parent from when the child was created, not a reference to one canonical parent.
  2. I need to be able to access all the child scopes for a node from that node.

Unfortunately, this is one of the few places where in my experience Rust doesn't shine. I think the solution might involve an "arena" of scopes, which I've tried to implement as so:

#[derive(Debug, Default)]
pub struct ScopeArena {
    scopes: Vec<Scope>,
}

impl ScopeArena {
    pub fn new() -> Self {
        Self::default()
    }

    pub fn new_scope(&mut self, parent: Option<ScopeId>) -> ScopeId {
        let id = ScopeId(self.scopes.len());
        self.scopes.push(Scope::new(id, parent));
        id
    }

    pub fn get(&self, id: ScopeId) -> Option<&Scope> {
        if id.0 < self.scopes.len() {
            Some(&self.scopes[id.0])
        } else {
            None
        }
    }

    pub fn get_mut(&mut self, id: ScopeId) -> Option<&mut Scope> {
        if id.0 < self.scopes.len() {
            Some(&mut self.scopes[id.0])
        } else {
            None
        }
    }
}

#[derive(Debug, Copy, Clone)]
pub struct ScopeId(usize);

#[derive(Debug)]
pub struct Scope {
    id: ScopeId,
    parent: Option<ScopeId>,
    children: Vec<ScopeId>,
    decls: HashMap<String, NodeId>,
}

impl Scope {
    fn new(id: ScopeId, parent: Option<ScopeId>) -> Self {
        Self {
            id,
            parent,
            children: vec![],
            decls: HashMap::new(),
        }
    }

    /// **Panics if name already exists.**
    pub fn put(&mut self, name: String, id: NodeId) {
        assert!(self.decls.insert(name, id).is_none());
    }

    pub fn get(&self, arena: &ScopeArena, name: &str) -> Option<NodeId> {
        match self.decls.get(name) {
            Some(decl) => Some(*decl),
            None => {
                if let Some(parent) = self.parent {
                    arena.get(parent).unwrap().get(arena, name)
                } else {
                    None
                }
            }
        }
    }

    pub fn exists(&self, arena: &ScopeArena, name: &str) -> bool {
        self.get(arena, name).is_some()
    }
}

But I'm not sure if this is the best way to implement this. Is there a simpler way to implement this that will still satisfy the borrow checker? Thank you!

This is, it turns out, a place that's just hard to shine. Even in something like C# with a GC, XObject.Parent causes all sorts of problems because you can put the same xobject into multiple other elements, and then it's super-unclear what happens. (Is it the first parent? the last parent? implicitly copied if it already has a parent? etc.)

So my usual suggestion is of course to just not do this :upside_down_face: Maybe you can keep track of where you are textually, and only ever navigate from the root. Maybe you want to just treat it as a full graph and do something like http://smallcultfollowing.com/babysteps/blog/2015/04/06/modeling-graphs-in-rust-using-vector-indices/ or the petgraph crate.

If you do really want to do it, note that you have, in a sense, and ownership cycle here. So what's the way to break cycles? Weak pointers. So maybe you want something like

pub struct Scope {
    parent: Option<sync::Weak<Scope>>,
    children: HashMap<String, sync::Arc<Scope>>,
}
3 Likes

As an aside, this:

should simply be:

pub fn get(&self, id: ScopeId) -> Option<&Scope> {
    self.scopes.get(id.0)
}
5 Likes

Thank you for your reply!

I don't have experience with C#, but my understanding was that this sort of problem was easier in GC languages because you can pass references around willy-nilly :slight_smile:

Could you elaborate a bit?

I think that's essentially what I'm doing in the example I gave in my post. I have a ScopeArena which owns all the scopes, ScopeIds which act as pointers into the arena, and Scopes which hold the actual data and references to their parent and children. In my scenario, I don't need a full graph (I only need a tree) but I think I'm using the same sort of model as the blog post you mentioned.

One thing that I don't love about my current approach is that it has the potential to create dangling references (they wouldn't cause memory unsafety, but they could cause weird bugs if the ID is valid but points to a different scope in a different arena) since the Scope does not store a reference to its arena. The reason it doesn't store a reference to its arena is because the borrow checker wasn't happy with it, which I can understand because I need a mutable reference to the arena but each Scope needs an immutable reference, which breaks the aliasing rules.

What do you mean by "ownership cycle"? Do you mean if I used Rc I would have cycles?

Thanks! I think I forgot or didn't realize that Vec had a get() method :slight_smile:

Yes. If two Rcs point to each other, or more generally, a cycle of Rcs points to the next one and the last one back to the first one, then they will keep each other mutually alive, leading to a memory leak.

It depends where you bound the problem. Is it easier to get it to compile and not leak? Yes. Is it easier to get all the edge cases right? No, and it might even be harder because you don't have move semantics to keep people from making the "parent" relationship confusing by using the same thing in multiple places.

I'm figuring you want the parent pointer for something like super in Rust. But if you know where you are in the scope tree, you could also just re-navigate from the root. Like if you know you're in ::foo::bar, instead of following the parent link you could go back to the root and go to ::foo that way.

Or maybe you only need the parent links for the place you currently are in the tree, and thus could keep them as a separate stack for your current position, instead of having them in the tree. Then the parent pointer wouldn't need to be in the data structure (where it causes awkwardness) but you could still efficiently walk your parents during name resolution. (The thought here is that may be you don't need the parent links for scopes that aren't in your current path from the root, which seems plausible.)

I meant the more general idea that you want holding a node to give you access to its children, but those children to give you access to their parent, which is a short cycle. So you have to do something to think about that cycle. Maybe that's weak pointers with reference counted ones. Maybe it's some unsafe magic to have one direction be a reference while the other is Box. (Since if you're holding the child its parent must be alive if the parent single-owns the child.) But the important point is that you have to do something about it -- as you said in the OP, it can't be a Box both ways -- so the most mechanical solution is to just use weak pointers as a fairly standard solution to this.

1 Like

I think you should try out the weak pointer solution at first, that's usually the common way to deal with this. You can explore the other approaches later.

1 Like

Hmm, I'm not sure if that would work for my use case; I need mutable access to some scopes even while immutable references are held to them in other scopes. The arena works because I get mutable access to the scopes as a whole and then I can temporarily pass in immutable references to the arena when the scopes need access to other scopes. When scopes need to reference other scopes, they reference them by ID which can only be dereferenced through the arena. How would I do that with Rc and Weak?

This seems interesting, I'll think about trying something like that. Thank you!

1 Like