Trees in the Rust book: why do they require multiple ownership?

In the Rust book there is the following implementation of a tree Node:

struct Node {
    value: RefCell<i32>,
    parent: RefCell<Weak<Node>>,
    children: RefCell<Vec<Rc<Node>>>,
}

(I changed i32 to RefCell<i32> so I am able to mutate this as well).

One can then use a variable current_node: Weak<Node> to move up and down the tree.

This works fine, but conceptually I don't understand why one should need to use Rc. If I understand correctly, Rc allows us to have data with multiple owners. In a tree, each node has exactly one owner, namely its parent node (or some variable we can name "root" in the case of the root node).

I tried to replace this with

struct Node<'a> {
    value: RefCell<i32>,
    parent: RefCell<Option<&'a Node<'a>>>,
    children: RefCell<Vec<Box<Node<'a>>>>,
}

Then we would use current_node: &Node.

Now this doesn't work, because it's not possible, e.g., to change current_node from a node to one of its children.

Simplifying things, the problem comes down to the fact that

fn test<T>(v: &RefCell<Box<T>>) -> &T{
    v.borrow().as_ref()
} 

does not compile:

cannot return reference to temporary value; returns a reference to data owned by the current function.

The analogous

fn test2<T>(v: Weak<RefCell<Rc<T>>>) -> Weak<T> {

    Rc::downgrade(&v.upgrade().unwrap().borrow())
}

does compile. So why can't I return &T from &RefCell<Box<T>> ?

(I also tried using Ref<Node> instead of &Node as the type of current_node, but then there are other issues).

Or more to the point, am I wrong in thinking that conceptually trees should not require multiple ownership?

1 Like

Correct. Just because you can have multiple Rcs doesn't mean you must. This is a tree with single ownership, albeit checked at runtime, not at compile time.

Right, because you're trying to push the runtime check of the RefCell to compile time. If you could do that, you wouldn't need the RefCell at all - the whole point of it is runtime borrow checking.

Right, because you're not borrowing the inner T, you're just creating a new (weak) reference to it.

Because you can use &RefCell<Box<T>> to modify the T. If you could do this, you could make the &T live longer than the T is actually valid.

Consider the difference between RefCell::borrow_mut and RefCell::get_mut. The point you're getting at is that there is no equivalent to get_mut for borrow, because it would be unsound to allow mutation through a shared reference if that shared reference could also be used to access the contents concurrently.

Here's another way of looking at it: actually, a doubly-linked tree does require multiple ownership, because although the strong pointers only point one way, when you actually intend to traverse the weak child-parent link, you have to upgrade the pointer.

Using & can work, but because it's checked at compile time, that means the lifetimes that make it work have to be visible to the compiler. You can only nest as many lifetimes as you have lifetime scopes - no nonsense like fn test that flattens two into one. If you want the flexibility of runtime borrow checking, you can't make the borrow checker do the work for you.

2 Likes

Just think about it logically. It would be fundamentally wrong if you could do that.

It is the whole point of RefCell (and its multi-threaded cousins like Mutex and RwLock, which you might be more familiar with) that it tracks what kind of borrows and how many of them it handed out (i.e., it converts static borrow checking into dynamic borrow checking). This would be impossible with a plain old reference. Plain old references don't have destructors, so they don't know when they go out of scope, and they can't, consequently, notify their originating RefCell.

Thus RefCell::borrow() and RefCell::borrow_mut() return Ref and RefMut, respectively, which are smart pointer types with destructors that signal back to their owner RefCell when they get destroyed. This is how the RefCell knows whether it should allow further immutable and mutable borrows to its content to be created.

If you could obtain a plain reference to the interior of the RefCell, then this runtime tracking mechanism would break down, and you could achieve undefined behavior, see e.g. Playground.

4 Likes

Your tree has parent pointers. If you tried to model that without Rc, then a node would be owned by both it's parent and children.

3 Likes

Thanks everyone! This is really helpful.

I now understand clearly why I can't get &T from &RefCell<Box<T>>. I also get why the same problem does not occur when trying to get Weak<T> from Weak<RefCell<Rc<T>>> (having a Weak<T> does not guarantee that the value inside will not be mutated, whereas having &T does).

However, I am still confused about the need for Rc/multiple ownership in this case.

I see in the code that given a Weak<Node> one first upgrades this to then be able to return another Weak<Node> pointing to a child. So between the upgrade and the downgrade one is in fact using multiple ownership. But I don't understand why this should be necessary.

This seems to point to the same problem in my understanding. I don't think children nodes should need to own their parent node. They just need "some kind of reference" to their parent node. Given such a reference to a Node one should then be able to get a reference to its parent Node or to any of its children Nodes and one should also be able to modify all the contents of the Node (using RefCell). Conceptually I don't understand why this should require multiple ownership, given one can think of a simple ownership structure flowing from the root to the leaves.

I have tried a lot of different approaches to avoid using Rc/multiple ownership (including being more careful with lifetimes) and they all failed :frowning: But I am not very experienced with Rust, so I was wondering whether I was missing something or whether Rc is really fundamentally necessary here.

What happens when a child node is inspecting it's parent and the parent is dropped else where?

1 Like

Unfortunately, the language does not provide a reference of that kind.

The language comes with the following types of pointers:

  1. Box. A pointer that owns its target.
  2. &T. A shared temporary borrow. For the entire duration of the borrow, the value must remain immutable, even through other pointers.
  3. &mut T. A unique temporary borrow. For the entire duration of the borrow, all access to the value must go through the mutable reference.
  4. Rc. Shared ownership.
  5. Raw pointers. Ownership is managed unsafely with no help from the compiler.

The only kind of pointer that lets you have non-temporary parent references is Rc and raw pointers.

10 Likes

No, this is false. They have exactly the same semantics wrt. borrowing and mutability. If you have a Weak<RefCell<_>> or a &RefCell<_>, both will allow you to mutate the inner value. After all, all Weak does is hand out a regular reference if it can be upgraded to an Rc.

1 Like

:thinking:Actually, the method to find the answer of 'why' questions like this that works for me is, implementing the functionality with unsafe but only expose safe interfaces. Then try to find out whether the implemetation is sound. Most of time, it isn't.

1 Like

Thanks everyone! This is a lot for me to think about.

I have also found this crate, which seems to do what I want: https://crates.io/crates/refbox

It really isn't complicated. It's just memory. There is no magic: Rc and others are for the most part regular types that you could have implemented yourself (modulo stuff like unsized coercions, but those have nothing to do with the principal modus operandi of these types). So is RefCell. All you have to do to infer all of what we said in this thread for yourself is to consider what happens when you take references to this or that part of memory, keeping in mind the "sharing-xor-mutability" principle. Everything else follows from that.

2 Likes

Just for reference, this implementation using Ref<T> and RefBox<T> from the refbox crate works. RefBox<T> has exclusive ownership and Ref<T> is a weak reference which allows for mutable borrowing.

struct Node {
    value: i32,
    parent: Option<Ref<Node>>,
    children: Vec<RefBox<Node>>,
}

Then one can have current_node: Ref<Node>. This allows for moving up and down the tree and mutating everything.

RefBox still uses reference counting.

From its docs:
However, a RefBox may have many Ref pointers to the same data. These pointers don’t own the data and are reference counted, comparable to the standard library’s Weak.

The nice feature of RefBox is obviously that it allows mutations more easily, but the point I want to make here is that making a tree still needs reference counting like Rc.

1 Like

It also uses unsafe, self-describes as "experimental", and has 334 crates.io downloads total. I'd try your code against Miri at least.

(That said, it's quite impressively documented.)

1 Like

I agree, it looks well documented. I took a look at the API and some of the implementation and couldn't find any obvious problems; it seemed generally high quality, and even though I didn't run miri tests myself, I wouldn't be surprised if the author tested their crate against miri already anyways. It's also code that's not very hard to get right, since it's all non threadsafe (no Send or Sync) and there's not all that much state to track either, so the amount of possible subtle mistakes is limited in the first place.

I think the only thing I noticed so far was that in one destructor, the deallocation could have still been made to happen if the contained type's destructor panicked; panicking destructors aren't a very nice thing anyways, and all that happens is that slightly more leakage happens than what would be necessarily.

2 Likes

I think the main reason for implementing a tree using reference counting is for learning the usage of Rust's Rc, RefCell. While library users do need shared ownership of tree nodes, the API should be well designed and should never expose the node's linkage fields as pub Rcs, which can be seen in leetcode and considered harmful for Rust beginners sticking to correctness over convenience.

1 Like

Yes, (I think) I understand the need for reference counting here, just not the need for multiple ownership, as it doesn't seem logically necessary for a tree.

What seems necessary is exactly one owner and multiple reference counted weak pointers. It seems that functionality was not implemented in the language and this crate does it.

Checking if the crate is correct is unfortunately way beyond my current Rust skills, but I'm happy that people here seem to agree that it looks good :slight_smile:

I didn't know about Miri, thanks! I checked my code using RefBox against it and got no complaints. :slight_smile:

1 Like

There is no need for multiple ownership. You could use a SingleOwnerRc if there were such a thing in the standard library. But there is no such thing in the standard library. Hence, the means by which you achieve reference counting is Rc.

1 Like

That phrase is very hard to understand for me. You can have multiple ownership without reference counting (e.g. via use of unsafe and pointers) but you can not have reference counting without multiple ownership! It's just pointless and useless without it!

Rust doesn't work like that. If you don't have owning reference to something then you can not touch that something. You still have to upgrade your weak pointer into reference at some point if you plan to use it.

Yes, but I wouldn't call this crate as adding any new functionality. It just offers one more internally unsafe data structure with safe wrapper to one's repertoire.

Technically what you have here is something with up-to-two-owners-and-many-watchers which may be nice thing to have in some cases, but it doesn't change ownership story significantly IMO.

I think everyone understands what is happening here, it's just question of phrasing it right. The aforementioned RefBox implements small variation of Rc internally: there may be up to two owners (one normal and one “upgraded weak reference”) and that's how it may offer sound interface.

Is it actually sound? Most likely.

Does it eliminate the need to have multiple ownership? Who knows: if you call one owner “real owner” and the other owner “not-really-an-owner-just-someone-who-may-keep-data-while-original-owner-is-long-gone”… have you now got two owners or not?

Note that RefBox description, of course, lies. It says that when owner is dropped value that RefBox held is destroyed, too, but… of course, it can not work like that. It wouldn't be sound.

Interface of RefBox makes it a bit hard to expose that lie, but here is the full example:

use refbox::RefBox;

fn gimme_not_owner<'a>(weak: &'a mut refbox::Ref<i32>) -> refbox::Borrow<'a, i32> {
    // Create a RefBox.
    let owner100 = RefBox::new(100);

    // Create a weak reference.
    std::mem::swap(weak, &mut owner100.create_ref());

    // Return “non-owner”.
    weak.try_borrow_mut().unwrap()
}

fn main() {
    // Create a RefBox.
    let owner42 = RefBox::new(42);

    // Weak reference, not owner.
    let mut weak = owner42.create_ref();

    // Access the data which “have no owner” anymore.
    let borrow = gimme_not_owner(&mut weak);

    // We have two boxes with different values: 42 and 100…
    // how many owners?

    // Works perfectly because “non-owner” keeps it alive.
    println!("*owner42 value: {}, *borrow value: {}",
             *owner42.create_ref().try_borrow_mut().unwrap(),
             *borrow);
}

Before println here we have two values: one is held by owner42 and the other is held by borrow while the original owner100 is long gone.

So… if “borrower” can hold data indefinitely, pass it around and so on… why that “borrower” is not an owner?

1 Like

I don’t think there’s a need to try to assign too much meaning to the wording in refbox docs. The crate is really nothing but an optimized API-limited alternative to Rc<RefCell<T>>. If you use Rc<RefCell<T>> in the same manner as the functionality the RefBox<T> API offers, it operates the same and it can do everything that RefBox does and more. However there’s some memory overhead and performance overhead (for some operations) that RefBox achieves to avoid.

In particular Rc<RefCell<T>> involves 3 counters, one for strong references, one for weak references, and one for shared borrows of the RefCell. RefBox eliminates all but the weak counter, and a 4-states state enum for indicating, essentially, the mutably-borrowed state of the RefCell and up to one logically upgraded weak reference, and to indicate when the owner has been dropped, which together replaces some of the most important functionality that the strong counter of Rc plays and of the RefCell’s state/counter. Also, it chooses to allow only a u32 for the weak counter; thus saving memory, and also some run-time that’s spent handling and modifying all those counters.

Of course you are correct that (try_)borrow_mut creates something that is also keeping a (usually short-term) ownership of the value, since the idea is that it’s functionally comparable to 1. upgrading the Weak counter, 2. borrowing the RefCell mutably, 3. returning a handle that (essentially) conveniently combines the (equivalent of the) upgraded Rc and the RefMut handle for the RefCell.

Especially since this is a non-thread-safe primitive where interactions with the RefBox or any Ref are never preempted by any scheduler, presumably the crate author simply assumes that borrow_mut callers typically only keep the returned Borrow value around for a very short time (which is important also to avoid running into panics, if the non-try_ variants are used, anyways), and if they are only kept borrowed for short transactions, then the intuition that dropping the RefBox will drop the contained value should be generally true.

2 Likes