Trying to walk a parent pointer tree

I'm still learning lifetimes and I hope somebody could help me understand how to annotate this.

First, note: This is a simplified example of part of a port of a simple interpreter from Java, and the use of Rc<RefCell<Environment>> is based on the fact that the Environment class needs to be passed around (with mutability) a bit.

I'm very new to Rust, and still getting my feet wet.

    use std::collections::HashMap;
    use std::{cell::RefCell, rc::Rc};

    struct Environment {
        parent: Option<Rc<RefCell<Environment>>>,
        values: HashMap<String,i32>,
    }

    impl Environment {
        fn put(&mut self, name: &str, value: i32) {
            self.values.insert(name.to_string(), value);
        }
        
        fn get(&self, name: &str) -> Option<i32> {
            if let Some(v) = self.values.get(name) {
                Some(*v)
            } else {
                None
            }
        }
        
        pub fn ancestor(&self, distance: usize) -> &Self {
            if distance == 0 {
                self
            } else {
                if let Some(parent) = &self.parent {
                    // this here's the trouble
                    return parent.borrow().ancestor(distance - 1);
                }
                todo!(); // handle errors later
            }
        }
    }

    fn main() {
        let global = Rc::new(RefCell::new(Environment {
            parent: None,
            values: HashMap::new(),
        }));
        let local = Rc::new(RefCell::new(Environment {
            parent: Some(global.clone()),
            values: HashMap::new(),
        }));
        
        global.borrow_mut().put("foo", 1);
        local.borrow_mut().put("bar", 2);
        assert_eq!(global.borrow().get("foo").unwrap(), 1);
        assert_eq!(local.borrow().get("bar").unwrap(), 2);
        
        assert_eq!(local.borrow().ancestor(1).get("foo").unwrap(), 1);
    }

The goal is to be able to walk up the parent links to a specific environment. It's obviously a lifetime annotation matter, but I don't know how I'd annotate that the environment referenced is owned by an Rc and is safe?

Playground link

Aside

You can have syntax highlighting in your code by using triple backticks ```, ideally followed by the name of the language: ```rust (although for the very case of Rust, in this forum, you don't need it):

```rust
use std::collections...

...

```

Back to the topic at hand, your issue is that RefCells, when borrowed, yield a guard / wrapper around the actual reference, which "releases" the .borrow() read lock when dropped / its scope ends.

And the type &Self = &Environment does not convey that meaning; it's the type of a bare reference.

If we look at the type returned by .borrow(), we see that it returns a Ref<'_, Borrowee>, which must be understood as &'_ Borrowee, but wrapped in an object that adds drop glue.

So, for your function:

  1. The return type cannot be &'_ Self = &'_ Environment;

    • (But the return type in the distance = 0 case is indeed &'_ Environment; let's ignore that case for the moment)
  2. In the case of returning the .borrow() object itself, the return type could be Ref<'_, Environment>

  3. But to simplify things, we will tell Rust that we do not care about the exact name of the returned type, just that it be allowed to be an arbitrary object, so as long as it "acts as a &Environment".

    1. The behavior / interface / trait of "Acting as &T" is express through the Deref<Target = T> trait.

    2. The fact that the specifics of a type returned by a function be ignored, but for some specific behavior / trait is written as -> impl Trait.

      • There is also the question of the 'lifetime where using that object is legal, which leads to an appended + 'lifetime annotation.
    3. So we end up with:

      fn ancestor<'root_env> (self: &'root_env Environment, ...)
        -> impl Deref<Target = Environment> + 'root_env
      
  4. Finally, once we .borrow() the item, we are not done with it, we would like to use that object to retrieve an inner reference, which is the distance - 1-th ancestor itself.

    This "post-processing" step is expressed through the Ref::map(|it| &it....) adaptor:

    use ::core::cell::Ref;
    
    Ref::map(parent.borrow(), |it| it.ancestors(distance - 1))
    

    Sadly Ref::map itself requires a & _ return type, so the recursive impl Deref doesn't suit it. We kind of hit a dead end here for this very case, but know that other similar problems could be solved with Ref::map. Just not the nested / recursive situation that we have here.

At this point, there are two workarounds; one which is a bit unergonomic but is close to what you wanted, and one which uses the ref-counting already present to yield Rc-owned references instead of &'_-borrowed ones.

Rc-owned references

The idea would be to replace -> &'root_env Self with some reference type that is not infected with the 'root_env lifetime of the borrow of the input self, so that when, recursively, the temporarily borrowed parent is dropped, we can still return our return value.

If we dismiss the distance = 0 case yet again, this yields:

pub
fn ancestor<'root_env> (
    self: &'root_env Self,
    distance: usize,
) -> Rc<Environment> // No 'root_env lifetime => we don't care about *self being dropped!
{
    if distance == 0 {
        todo!("self")
    } else {
        if let Some(parent) = &self.parent {
            parent
                .borrow()               // <- Creates a temporary... 
                .ancestor(distance - 1)
            // <- ... which is dropped here. But we don't care!
        } else {
            todo!() // handle errors later
        }
    }
}

What about distance = 0?

Well, handling this one turned more complicated than expected, given how different things chained up.

Basically, the initial idea would be to replace:

- self: &'_ Environment,
+ self: &'_ Rc<Environment>,

so as to be able to just:

if distance == 0 {
    return Rc::clone(self);
}

The problem then being, that the recursive parent.borrow().ancestor() call does not work anymore, obviously, since it isn't an Rc.

But do you know what is an Rc? Just parent. But it is not an Rc<Environment>, it is an Rc<RefCell<Environment>>.

So let's try that:

impl ??? {
    fn ancestor (
        self: &'_ Rc<RefCell<Environment>>,
        distance: usize,
    ) -> Rc<RefCell<Environment>>
    {
        if distance == 0 {
            Rc::clone(self)
        } else {
            if let Some(parent) = &self.borrow().parent {
                parent.ancestor(distance - 1)
            } else {
                todo!()
            }
        }
    }
}

and that does work, except that the part to write in ??? is non-trivial:

  • if self: &'_ Rc<RefCell<Environment>>, then, since self: &'_ Self or self: Self, it means that
    Self = Rc<RefCell<Environment>>
    (or that Self = &'_ Rc<RefCell<Environement>>; I'll go with the former).

So the type to write after the impl and before the { is Rc<RefCell<Environment>>:

impl /* Self = */ Rc<RefCell<Environement>> {
    fn ancestor ...
}

But then we hit coherence problems: we are not allowed to have a direct impl on such type, since it is "not ours".

One (see the following post for a better alternative) solution to this last (yes, I promise!) problem is to use a helper trait:

impl RcRefCellEnvironment for /* Self = */ Rc<RefCell<Environement>> {
    fn ancestor (
        self: &'_ Rc<RefCell<Environment>>,
        distance: usize,
    ) -> Rc<RefCell<Environment>>
    { ... }
}

and now, the only thing missing, is some generic definition of that trait, even if we are only implementing it once: let's simply replace Rc<RefCell<Environment>> with Self:

trait RcRefCellEnvironment {
    fn ancestor (
        self: &'_ Self,
        distance: usize,
    ) -> Self
    ;
}
  • Oh, and the call site has changed a bit:

    - local.borrow().ancestor(1).get("foo").unwrap()
    + local.ancestor(1).borrow().get("foo").unwrap()
    
  • Playground

CPS: The magic trick to "return" stuff that refers to locals

CPS, or Continuation-Passing Style, or in layman's terms, "using a callback" is a very neat and powerful trick that can get you out of 99% of "value returned is referencing / borrowing a local", at the cost of some ergonomics.

The basic idea is that, instead of writing:

let foo = obj.get_thingy();
// use foo...

callers will need to write:

obj.with_thingy(|foo| {
    // use foo...
})
  • It does lead to rightwards-drift; and the obtained thing is very infectious: you may need to keep using CPS at the intermediate levels of abstraction.

That being said, the change of control flow, when viewed by Rust, is gigantic: we are no longer requiring a function to give us something when it returns (and drops its temporaries), instead, we are telling the function what we would wish to be doing with the value it should produce. So the callee will apply that logic before returning, and thus, before dropping its temporaries.
The result? No more "borrowed-value does not live long enough" errors :slightly_smiling_face:

Enough talked, time for a demo:

pub
fn with_ancestor<R> (
    self: &'_ Environment,
    distance: usize,
    //           "returned" value here
    //               vvvvvvvvvvvv
    ret: impl FnOnce(&Environment) -> R,
) -> R
{
    if distance == 0 {
        ret(self)
    } else {
        if let Some(parent) = &self.parent {
            parent.borrow().with_ancestor(distance - 1, ret) // recurse with the same cb
        } else {
            todo!(); // handle errors later
        }
    }
}

and then, the caller writes:

- assert_eq!(local.borrow().ancestor(1).get("foo").unwrap(), 1);
+ assert_eq!(local.borrow().with_ancestor(1, |it| it.get("foo").unwrap()), 1);

or:

local.borrow().with_ancestor(1, |ancestor| {
    assert_eq!(ancestor.get("foo"), 1);
});
5 Likes

Note: this was written concurrently with @Yandros’s extremely detailed answer above. It provides a slightly different solution to the last step by using a newtype instead of a trait, but with a much less rigorous explanation.


The issue here is that RefCell::borrow() returns an RAII guard object that has to outlive the reference you’re trying to return. Often in situations like this, I’ll use two structs, one for the internal data and another to serve as a handle:

#[derive(Clone)]
pub struct EnvHandle(Rc<RefCell<EnvData>>);

struct EnvData {
    parent: Option< EnvHandle >,
    values: HashMap<String,i32>,
}

Then, all of the user-facing API can be implemented on EnvHandle (untested):

impl EnvHandle {
    pub fn ancestor(&self, distance: usize) -> Self {
        if distance == 0 {
            self.clone()
        } else {
            let data=self.0.borrow();
            if let Some(parent) = &data.parent {
                // this here's the trouble
                return parent.ancestor(distance - 1);
            }
            todo!(); // handle errors later
        }
    }
}
2 Likes

Just for context: this is the solution regarding the "Rc<RefCell<Environemnt>>" kind of problem, that I solved with a helper trait.

My approach has the minor short-term advantage of not requiring to refactor the rest of the code (it has also the characteristic of making the Rc and RefCell visible from outside, which may avoid unnecessarily nesting Rcs and the like, but at the cost of having exported implementation details that we can no longer change without it being a breaking change).

But having part of the API be served with one-shot traits is quite uncommon in Rust, and many users will simply not know of these APIs. It is also cumbersome, in that using the trait methods requires that the trait be in scope.

Because of all this, the other approach to solve the "coherence" problem with the impl Rc<RefCell<Environment>> from my previous post, is to wrap that type in a custom new type (e.g., a tuple struct).

struct MyRcRefCellEnvironment(Rc<RefCell<Environment>>);

impl MyRcRefCellEnvironment /* Ok */ { ... }

And at tha point:

  1. for the sake of consistency / ergonomics, you move all the other methods from Environment to this MyRcRefCellEnvironment newtype;

  2. At that point there is no reason to keep the Environment publicly around, which frees that nicer public name. So you rename:

  • MyRcRefCellEnvironment to Environment,

  • and what used to be Environment gets renamed to something else, usually EnvironmentData, EnvironmentInner, EnvironmentFields, etc.

This is a very common pattern that @2e71828 has brought up (on point!), so it is good that you get acquainted with it.

3 Likes

I want to thank all of you for your detailed and informative responses! The Rust community is, frankly, amazing.

Suffice to say I'm going to have to digest this information, and I think I'm going to have to start over from zero and re-learn how lifetimes work, because I'm not quite grokking it yet.

I'm going to give a stab at some of these approaches, but I think the real problem is that in my port of the interpreter ( https://craftinginterpreters.com/ ) I've allowed a polymorphic/OO design for the scoped environments, when I should have re-thought how that component worked in a more Rust-y manner.

My thought is that the Environment parent pointer relationship is always a linked list, not a tree, and could be better represented by a stack. I could, instead of having set/get/etc methods on the Environment instances, rather have those APIs on an EnvironmentStack, which does the traversal and lookup itself on a simpler data structure.

Anyway, I really want to say how amazing this community is and thank you for your guidance. Learning Rust feels like I need to re-learn CS from zero - and I say this as someone working in C/C++ for 25 years.

2 Likes

I've been pretty busy with work - and re-reading materials about Rust lifetimes, etc - but I wanted to let you know I used this approach and it worked beautifully. Also, hiding the Rc<RefCell<EnvironmentData>> in a tuple struct really cleans up the call sites.

Thanks for the guidance!