Recursive iterator with RefCell -- "data owned by the current function"

Hello,

Working on AoC22#07, I was hoping to create two versions of the solution to practice using recursive data structures with the Rust stdlib -- one using a Rc<RefCell<T>>-based tree, and eventually another using an arena(? I think this is the term, something like indextree) with a Vec<usize> to keep track of nodes.

For this AoC problem, my Node type is an enum with two variants, one of which holds a vec of references to other nodes. I'm currently working on iterating over the tree structure. I was able to adapt code from this post to make a PoC recursive iterator over an enum, so I thought I was doing pretty well: Rust Playground

However, now I'm trying to put the Rc / RefCell parts back into the enum and ending up with issues, because borrowing to get at the underlying type results in a reference to something owned by the function: Rust Playground

(I've changed the types slightly for that example since my real use case the equivalent of Val is Clone but not Copy like an i32.)

Am I missing an opportunity to get this to compile with something fairly clean and straightforward like the example from fasterthanli.me above? Or, now that Rc<RefCell<T>> has been introduced, do I necessarily need to look to splitting out a separate NodeIterator type with a stack?

Thanks in advance for any suggestions!

It doesn't make sense to want to return a reference to inside a RefCell. (FYI, the same applies to mutexes for the same reason.) The cell needs to know about all outstanding borrows, so that it can track whether or not to give out other borrows, thereby enforcing exclusive mutability.

You should probably solve this by wrapping the guard objects returned by borrow() and borrow_mut(), eg. search the docs for Ref::map().

1 Like

Not the solution to your specific problem, but often Rc+RefCell graphs are not the correct answer in Rust. You might think you need them because they kinda give you what you used to have in other languages, but actually their limitations will bite you like in this case. A better exercise IMO would be to implement this using indextree or like a normal tree where the only pointers are those torwards the direct children, i.e. something like this:

enum Entry {
    Dir(HashMap<String, Entry>),
    File(usize),
}

FYI this is how I solved this AoC day (except I used a &str instead of the String, bonus points if you can do that too). You can assume there are no cd / commands except the very first one if you find that hard to handle when parsing. It's possible, though a bit ugly, and no input given by the website should contain that anyway.

1 Like

Thanks for your input / response. I know these problems almost always can be solved in a much simpler / clever way, but I'm trying to learn more about general approaches (since my goal is mostly learning more about rust, less about finishing AoC).

I know they contribute a lot of complexity and can (will in my case) lead to runtime panics, which defeats many of the benefits of rust. I had such a hard time with them doing aoc 2018 that I wanted to revisit them this year.

I'm sure that would be simpler for this specific case, but it would be nice to be able to find a parent!

You could use internal iteration:

impl Node {
   fn visit_with(self: &'_ Node, visit: &mut impl FnMut(&Val))
   {
        match self {
            | Self::Value(v) => visit(&v.borrow()),
            | Self::Children(c) => c.iter().for_each(|n| n.visit_with(visit)),
        }
    }
}
  • (this visitor API is actually .for_each(), so feel free to also write .try_for_each since its break semantics should allow you to be almost as flexible as external iteration).

  • (you can also play with async, but at that point it's not gonna be as smooth).

1 Like

Ref::map is new to me, thanks for the suggestion!

It seemed like I was on the verge of getting it to compile with this (or a few variations with .cloned() and some dereferencing), but not quite: Rust Playground

impl Node {
    pub fn iter(&self) -> Box<dyn Iterator<Item = &Val> + '_> {
        match self {
            Self::Value(v) => Box::new(std::iter::once(&*v.borrow())),
            Self::Children(children) => Box::new(children.iter().flat_map(|child| {
                *Ref::map(child.borrow(), |c| &c.iter())
            })),
        }
    }
}

It's still the case you will never be able to return an iterator over &Val. Iterators in Rust can be consumed while the iterated values are alive. So imagine what this would mean:

let v: Vec<&Val> = root_node.iter().collect();
// The iterator no longer exists; there are no hidden objects floating
// around; aka Rust has no garbage collection / automatic liveness scope
// extension.
//
// So now I hold references to every `Val` in the tree, but there's nary a
// `Ref<'_, Val>` in sight, which means all those `RefCell` counts are
// back at zero.  If you accomplish this, you could go get a `borrow_mut`
// to some `Val` that you also have a reference to in `v`, and that's
// instant UB.

So this signature can never work:

    pub fn iter(&self) -> Box<dyn Iterator<Item = &Val> + '_> {

Furthermore, if you just try to swap in Ref<'_, Val>, you'll still be discarding all your Ref<'_, Node> and so will still be in trouble. It's a lot like this situation. (You might want to read the whole series.)


Incidentally,

Ref::map(child.borrow(), |c| &c.iter())

That closure is basically the same as

fn foo() -> &str {
    let local = String::new();
    &local
}

Moreover, Ref::map is for field projection, like Deref is; that's pretty much what the FnMut(&T) -> &U signature means. You don't have an iterator as a field of Node, so you can't map one out.

2 Likes

You still shouldn't be returning references – my comment about Ref::map() was meant to tell you exactly that the only way to return a reference behind a RefCell is to return a Ref or a RefMut, and so that's what you should be returning. (Ref::map() is then useful for projecting a type over the returned RAII guard object, in order to hide some implementation details.)

Thinking about your code more from a bird's eye view, however, you should just return Rc<RefCell<_>>s instead. It's way easier to implement.

2 Likes

@quinedot What a fantastic explanation, thanks for your time and effort!

I assume you're meaning "never [when using Rc]"? Because returning &Val from an iterator seems to be the right thing that other people do that I'm trying to emulate (which is what got me going down this route -- I had it previously working by just returning an owned Rc). Rust Playground

Yes, I thought that's what the (elided) lifetimes helped flesh out, that the lifetime of Item was tied to the struct. (These might be wrong, I still have a lot to learn about lifetimes, but it seems to compile and run.)

pub fn iter(&self) -> Box<dyn Iterator<Item = &Val> + '_>

->

pub fn iter<'a>(&'a self) -> Box<dyn Iterator<Item = &'a Val> + 'a>

OK, now this is shining some light on where my mental map has gone wrong. I thought having Rc<RefCell<T>> with a &T somehere wouldn't be a problem, but perhaps the issue is that the lifetime of &'huh T is unclear, because Rc could drop the T at any time (if the count gets to zero). So any lifetime for Rc<RefCell<T>> only refers to the Rc and not the T. Is that getting warmer?

I have! But it's been a few years, I should read it again to see how much more clicks now as opposed to back then.

Thanks again for such a helpful response!

Ah okay, thank you.

I agree. I had it working like this before, but it seemed more idiomatic for an iterator to return references so I wanted to see if I could get it working with -> &Val instead of -> Rc<RefCell<Val>>. On further consideration, it seems like this is missing the whole point of using an Rc in the first place.

I mean, never when the Vals are in a RefCell. The Ref<'_, Val> is analogous to a RwLockReadGuard<'_, Val>: You can only get access to the underlying Val while the read guard Ref<'_, Val> is still alive. Once you drop that, you give up access. Moreover the way RefCell works relies on this for memory safety, so circumventing it is generally unsound.

It's possible to do with an intermediate structure, where you basically create a mirror of your entire tree that holds Refs of Nodes and Vals, and then implement a borrowing iterator of that. Like this, but it's more of a pain to mirror a tree.

But I don't really consider it practical (unergonomic and a lot of cycles) and would probably go with the visitor pattern @Yandros mentioned.


Rc isn't the problem. Rc hands out references like candy (i.e. with no read guard equivalent), because it never hands out exclusive references (&mut). RefCell needs the read guard because it can hand out &mut, but has to make sure it never does so when there's an outstanding Ref<'_, _> -- because violating the exclusiveness of &mut is undefined behavior.


It's more of a liveness scope problem in my mind, really, although references do have some magical properties that may be muddying the waters. Anyway, here's another shot at describing the problem. Whenever you have some pattern like this:

// &'a X -> Thing<'a> -> &'a Y
let this = &x;              // &RefCell<Y> == &'a X
let middle = this.borrow(); // Ref<'a, Y>  == Thing<'a>
let desired = &*middle;     // &Y          == &'a Y

Then the desired &Y cannot outlast the liveness scope of the middle Thing. The desired reference is considered a borrow of the Thing as much as a &str is a borrow of a String. You have to keep the middle thing alive.

...unless Thing<'a> is a &'a or &'a mut, because those are magic. Illustration.

Now, that's true even when there's no actual destructor [1], but it's actually critically important for middle types that have a destructor -- the destructor gets exclusive access to itself which invalidates any references tied to it (which is a good thing because it prevents use after free and so on).

And Ref<'_, _> has a destructor -- it has to release its "read guard" on the underlying value. If you forget a Ref so the destructor doesn't run, any [2] future attempt to get a &mut out of the RefCell will panic, because it will think there's still an outstanding Ref. The Ref destructor "let's the RefCell know" that it is going away, and thus no reference derived from the Ref can still be alive.

In the linked list case, and the tree case, you actually have a big chain of middle Things that you have to keep alive if you want to get to a &Val of a leaf without cloning the Rcs. Every Ref generated by a RefCell between the root and the Val.


  1. so far anyway ↩ī¸Ž

  2. safe ↩ī¸Ž

1 Like

In general, the lifetime 'a in &'a T must depend on some owner of the T that is going to keep it alive for you, as long as the lifetime prevents its mutation. If you have a collection of Rc<T>s, then you absolutely can get &Ts out of it, as long as you got those &Ts by dereferencing the Rc<T>s. Rc<T> straightforwardly owns a T; the fact that it is owning the T jointly with all of its clones is not a problem.

The problem here is RefCell, not Rc. By using RefCell, you allow mutation through &RefCell. Therefore, possessing &RefCell<T> doesn't guarantee that the T won't be mutated. Therefore, you cannot ever get a borrow &T out of a &RefCell<T> or a Rc<RefCell<T>> or anything else. You have to get a std::cell::Ref<T> via RefCell::borrow(), which ensures at run time that the T isn't going to be mutated, and then you can borrow the &T from that Ref because the Ref is itself the promise that the T will hold still and not be mutated or dropped.

RefCell and Ref are sort of the two sides of a barrier across which borrowing does not cross unchanged: without interior mutability, you can go from &'a SomeContainer<T> to &'a T, but in the presence of RefCell, that is not sufficient; you will end up with an outer lifetime for &'a Rc<RefCell<T>> -> &'a RefCell<T>, and an inner lifetime for &'b Ref<'a T> -> &'b T. These lifetimes are related because 'a: 'b — the RefCell has to continue existing while you are borrowing it — but 'b is shorter than 'a because it only lasts as long as you keep the Ref around.

2 Likes

@n8henrie, I'd like to insist on this suggestion: give it a try:

let mut node_count = 0;
node.visit_with(&mut |_| node_count += 1);
dbg!(node_count);

There is even a helper crate:

with which you can derive a bunch of other niceties from such a single function implementation.

Examples of hand-rolling certain such functions

Here is a try_for_each-ish (find_map) idea I was talking about:

  • fn find_map<T>(self: &'_ Node, predicate: &mut impl FnMut(&Val) -> Option<T>)
      -> Option<T>
    {
         match self {
             | Self::Value(v) => predicate(&v.borrow()),
             | Self::Children(c) => c.iter().find_map(|n| n.find_map(predicate)),
         }
    }
    

which you can use to rewrite visit_with, and many other:

  • fn visit_with(self: &'_ Node, mut visit: impl FnMut(&Val))
    {
        enum Never {}
        self.find_map::<Never>(&mut |v| { visit(v); None::<Never> });
    }
    
    fn any(self: &'_ Node, mut predicate: impl FnMut(&Val) -> bool)
      -> bool
    {
        self.find_map(&mut |v| predicate(v).then(|| ())).is_some()
    }
    
    fn all(self: &'_ Node, mut predicate: impl FnMut(&Val) -> bool)
      -> bool
    {
        !self.any(|v| !predicate(v))
    }
    
    fn eq<T>(self: &'_ Node, mut it: impl Iterator<Item = T>)
      -> bool
    where
        Val : PartialEq<T>,
    {
        self.all(|l| matches!(it.next(), Some(r) if *l == r))
            && it.next().is_none()
    }
    

    etc.

so as to, for instance, still be able to write:

tree.eq(0..8)
1 Like

That makes sense. So far I've only really used Rc with a RefCell, so I was confusing which was doing what here.

Interesting, thanks for this example!

I think I was imagining that everything would be fine given fn foo(&'a self) -> &'a T, but in this case that's untrue because the &'a T is a reference to an object owned by foo (the Ref created by .borrow()), and this object will be dropped from the stack when foo returns. Does that sound right?

Looking at your example, is it because references are Copy whereas Thing isn't? ... Nope, deriving Copy doesn't change the compilation error. Huh.

This makes sense. Thank you again, very much, for helping clear up some misconceptions! As a hobbyist with very few techy friends or techy interactions IRL, it's always surprising to me the holes and misrepresentations in my mental models, and how efficiently they can get straightened out with a little dialog.

This was a delightfully well written explanation that directly addresses several misconceptions I was hung up on, in particular the "inner" lifetime after the borrow. Thank you!

Thank you, I was planning on it! But needed a minute to digest your response and the sigantures and review what a visitor pattern is. Interesting that wikipedia specifically lists filesystem iteration as an example of its use!.

Here's my first draft, thank you for the additional examples! Rust Playground

It seems like part of the idea here is to move state outside the function, so if I wanted to collect into a Vec, I could clone into one like so: Rust Playground

Cool to see a different approach, thank you! I see how this could be a more straightforward approach in many cases!

Yes, that's correct and a very succinct summary of why it can't work.

Well, it's certainly not pretty, but I finally finished AoC22 D7: aoc22-rust/main.rs at master ¡ n8henrie/aoc22-rust ¡ GitHub

I included 3 implementations -- a standard recursive iterator solution, an attempt at using an internal iterator (both of these using Rc<RefCell<_>>), and an "Arena" style solution with plan references. FWIW, I also included some benchmarks hidden beneath the --bench feature flag (because nightly).

I'm sure they could all use a lot of polish and fine-tuning, but hopefully I'll be able to refer back to them in the future. I more-or-less copied the internal_iterator crate for the respective implementation. For a while I was stuck on an issue with compile-time stack overflow when trying to accept an FnMut as an argument that was eventually resolved by taking an &mut FnMut instead. My gratitude to @Yandros for teaching me about this interesting new approach!

And thanks to many in this thread for all of your input!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.