[re-opened...] Walking a tree while accumulating a list of mutable references

I am writing a compiler for a made-up language for fun and education. I have an AST, where each node is a struct like Stmnt or Block. Some of these (like Block) own a symbol table environment where variable declarations for that scope are stored.

Example:

pub struct Block {
  pub env: Environment,
  pub stmnts: Vec<Stmnt>,
}

I'm writing some generic code for walking this structure; my hope is to implement a visitor pattern where these elements can be visited mutably along with a way of accessing all environments seen on the way down to that element.

It's easy if I only need access to the most-recently-seen environment; I discovered Rust is OK with me cleaving a struct like the above into two mutable references: one for the env and a separate, disjoint one for the stmnts. I can then enumerate the stmnts and visit each one, passing a mutable reference to both it and the environment separately. If the function for walking a Stmnt in turn walks an expression or something, it can continue to forward on down the most-recently-seen environment.

But I need access to all of the Environments seen on the way down, not just the most recent. What is the idiomatic way of handing down a growing list of disjoint mutable references like this?

My failures:

  • I tried to collect mutable references to the Environments in a Vec and pass that mutably on down, but that of course doesn't work because I have to pop() before returning from an inner call or else the Vec will have a mutable reference to a child. The compiler caught this and disallowed it.
  • I tried to create a simple struct SymTable that keeps an Optional<&mut SymTable> along with a &mut Environment, but I couldn't work out how to express the lifetimes for this (I lost several hours going down this path to no avail...). I figured I could make a new one for each Block encountered on the stack and then populate it with both the passed-in SymTable and the newly-encountered Environment.

My half-success: I went back to the Vec approach, but with a Vec<&mut Environment> that I create a new copy of each time one needs to be added. This compiles and works because the new Vec can have the new element added and no pop() is needed, and thus no risk of a reference from a child outliving the function call. But it seems very wasteful, algorithmically, because it requires I enumerate the list of references when copying them just to add one more. In my particular case, I'm not terribly worried about performance, but there must be a better way?

Could you post a minimal example to the playground?

I've tried to recreate the situation from the post, and it seems to just work: Rust Playground.

... yeah, that works. I don't know why I thought it didn't. I definitely tried something like it, but I must have typo'd it or something. Thanks!

OK, so this works for pre-order traversal, but not for post-order traversal. This makes sense because, with pre-order traversal, the only way to delete nodes is before visiting them and thus before having an opportunity to push a &mut reference from them into the vector (which might not be .pop()d before returning...).

By 'pre-order', I really just mean taking your example and adding a visit function that takes both the mutable node and the vector of mutable references. If you call such a function before iterating over children, it compiles. If you do it after, it does not compile.

I think this is fine for my purposes, but out of curiosity, would there be a way to solve that?

Yep, this result is expected.

In Rust, it is impossible to get two mut aliases for the location. Directly, it means that you can't have two &mut pointing to the same thing. However, this applies transitively as well: it is impossible to have &mut foo pointing to foo and &mut bar, which points to bar such that bar.foo is the same foo.

So, would be impossible to call a visit function such that it receives a unique reference to Node in one argument and a unique reference to this node's env in another argument, because the latter reference won't actually be unique. The traversal itself is irrelevant here actually :slight_smile:

I think a more interesting scenario, and likely what @Sydius actually meant, is the following:

 fn go<'a>(node: &'a mut Node, parent_envs: &mut Vec<&'a mut Env>) {
    parent_envs.push(&mut node.env);
    for child in node.children.iter_mut() {
        go(child, parent_envs)
    }
    parent_envs.pop();
    visit(node, parent_envs);
}

This is a case where by the time visit() runs there's no actual aliasing due to parent_envs.pop() pairing up with the earlier parent_envs.push(). The problem, of course, is the compiler does not understand this (naturally) - it just sees "arbitrary" method calls and their signatures, and that's it. But, this is code that is "clearly" correct from ownership/aliasing perspective.

One option would be to say "the heck with it, I know what I'm doing" and roll with raw pointers:

fn go<'a>(node: &'a mut Node, parent_envs: &mut Vec<&'a mut Env>) {
    let node_raw = node as *mut _;
    parent_envs.push(&mut node.env);
    for child in node.children.iter_mut() {
        go(child, parent_envs)
    }
    parent_envs.pop();
    // Safe because the push/pop pairing
    let node = unsafe { &mut *node_raw };
    visit(node, parent_envs);
}

This is cheap and quick, but if the aliasing is ever violated in the future, it'll be a problem.

Another way is to more explicitly express transfer of ownership, in a way the compiler can reason about:

struct Node {
    children: Vec<Node>,
    env: Option<Env>,
}

fn go<'a>(node: &'a mut Node, parent_envs: &mut Vec<Env>) {
    let my_env = node.env.take().unwrap();
    parent_envs.push(my_env);
    
    for child in node.children.iter_mut() {
        go(child, parent_envs)
    }
    let my_env = parent_envs.pop().unwrap();
    node.env = Some(my_env);
    visit(node, parent_envs);
}

Now we're wrapping the env field in an Option and explicitly moving it around - the parent envs are also stored as owned values, and we explicitly pop() it back to ourselves. We can still mess things up, but not in an ownership/aliasing UB way. Full play

1 Like

This also technically requires the visit function to be unsafe, because it must not swap the two topmost entries in parent_envs.

You mean so that pop() in the caller returns the correct env back? Yes, that's needed. I'm not sure if visit needs to be marked unsafe though - an unsafe fn usually means the caller is responsible for maintaining some invariant, but that's not the case here.

Perhaps pop() can assert that it gets back what it push()'d (i.e. ptr equality), however.

But I think in general, it's true that visit shouldn't muck with parent environments. The best would be to take &Vec<Env> in visit() so it's statically prevented, but I don't really know what @Sydius was planning on doing there.