Understanding the ownership with a simple example of a tree

Hi,

I've read all sorts of tutorials and documentation many times and I thought I understood how ownership and borrows work, but now I've run into this problem (this is a simplified form of the code to represent the problem):

struct Node {
    value: i32
}
struct Edge<'a> {
    from: &'a Node,
    to: &'a Node
}

struct Tree<'a> {
    nodes: Vec<Node>,
    edges: Vec<Edge<'a>>
}

fn find_node<'a>(tree: &'a Tree, value: i32) -> Result<&'a Node, String> {
    for i in 0..tree.nodes.len() {
        if tree.nodes[i].value == value {
            return Ok(&tree.nodes[i])
        }
    }
    return Err("ups".to_string())
}

fn add_if_missing<'a>(tree: &'a mut Tree, value: i32) -> Result<&'a Node, String> {
    match find_node(&tree, value) {
        Ok(node) => Ok(node),
        Err(_) => {
            let n_ele = Node { value: value };
            tree.nodes.push(n_ele);
            Ok(&tree.nodes[tree.nodes.len()])
        }
    }
}

fn main() {
    let mut tree = Tree { nodes: vec![], edges: vec![] };

    let node = add_if_missing(&mut tree, 4);
}

I don't understand why the compiler states that "returns a value referencing data owned by the current function". After all, line 25 should return a reference to an array element. It is not a value whose lifetime will end with the end of the function.

Just as I don't understand why I can't do the mutable borrow in line 28, after all the immutable borrow from line 24 has already completed.

How should I write this type of operation in Rust?

Where can I read to better understand how it works? As you can see, simple examples of type

let y = &a;
fun(a)

are insufficient for me.

That’s (trying to implement) a self-referential struct. The Edges are supposed to contain references to the Nodes in the same Tree struct. However a lifetime on a struct like Tree<'a> in Rust always means that references with the lifetime 'a inside of Tree can only refer to values not contained within the struct Tree<'a> itself, but somewhere else.

Rust doesn’t support self-referential structs like this directly. There are lots of options to work around the problem; to name one way: for example you could place an index (usize) for the nodes vector inside of your Edges, instead of having it be a reference:

struct Node {
    value: i32
}
struct Edge {
    from: usize,
    to: usize, // <- indices into the containing Tree’s ā€˜nodes’ vector
}

struct Tree {
    nodes: Vec<Node>,
    edges: Vec<Edge>
}

Thank you, can't I tell the compiler that node lives longer than edge? References are more convenient than using indexes.

There are crates that help with self-referential structs. For example with ouroboros, you can do something like

#[self_referencing]
struct Tree {
    nodes: Vec<Node>,
    #[borrows(nodes)]
    edges: Vec<Edge<'this>>
}

however, since nodes is borrowed in this struct, you cannot create any new nodes anymore. (You cannot also borrow nodes immutable.)

By the way, in your original code, there’s this bug that pushing to a Vec will move all the elements to a new allocation if the Vec needs to resize. This necessarily invalidates all of the references to elements of the Vec. This kind of problem is exactly the kind of things that Rust’s borrow checker prevents.

With references in the edges, deleting nodes is always hard, since you cannot really ever prove to the compiler that all Edges referencing the Node are gone. If you never want to delete nodes, you could use something like typed_arena::Arena which supports insertions with a &self immutable reference. (Then this can be combined with ouroboros, I guess.) This will still make modifying nodes hard; you might need to use some interior mutability primitive in order to be able to modify nodes (if that’s something you ever want to do). Something like RefCell or Cell around the i32 in the value field could help.

My main point is, while it’s possible to achieve some of these things while using ā€œactualā€ references, using indices might be simpler.


Btw, if you really want some form of references, yet another approach would be to use Rc or Arc. You could have a nodes: Vec<Rc<Node>>, and place copies of the reference counted pointer into the egdes

struct Edge {
    from: Rc<Node>,
    to: Rc<Node>,
}

This also has the property that modifying value of any Node is impossible after it’s created, unless you use something like

struct Node {
    value: Cell<i32>,
}
2 Likes

Thank you very much for the explanation, especially the reallocation information helps to better understand the compiler behavior. Sometimes I catch myself trying to treat Rust like a Python-like language forgetting that it's not Python :slight_smile:

Actually, I hadn’t really looked at your code beyond the definition of Tree yet. Since your code so far doesn’t even really touch the edges vector yet, the error you get has nothing to do with trying to create a self-referential struct. Let me try to explain what’s going on there:

First of all, one remark: you write things like tree: &'a Tree, which leaves out the lifetime parameter of Tree. This is somewhat deprecated syntax, it’s preferred to write this (equivalently) as tree: &'a Tree<'_>, to explicitly indicate the presence of an elided lifetime.

Now to the error at hand: Your code has two problems, the first one is the usage of match find_node(&tree, value) instead of match find_node(tree, value). A mutable reference can be coerced to an immutable one implicitly, so &mut Tree<'_> (the type of tree) can turn into &Tree<'_> (the type that find_note expects). Furthermore &&mut Tree<'_> (the type of &tree) can also be turned into &Tree<'_> implicitly, but the lifetimes differ:

  • A short-lived borrow &'short &'a mut Tree<'_> can only become a short-lived reference &'short Tree<'_>, while the &'a mut Tree<'_> itself can turn into &'a Tree<'_>

The second problem is that you use the &'a mut Tree<'_> in its full lifetime in one case (where the &'a Node from find_node gets returned) and in the other case, the find_node call only borrows the tree shortly, and it’s used for modifying the tree afterwards. This is not actually a stupid thing to try to do, however the current borrow checker of Rust doesn’t like this pattern at all. (There’s a new WIP borrow checker called polonius that will solve this problem some time in the future: See how with only the &tree=>tree change, your code compiles with the unstable -Zpolonius flag. If you want to avoid the panic, you’ll also need to change &tree.nodes[tree.nodes.len()] into &tree.nodes[tree.nodes.len()-1] :wink:.)

Your code has two problems, the first one is the usage of match find_node(&tree, value) instead of match find_node(tree, value) .

I don't know why I thought it didn't matter if the find_node() function signature takes a specific parameter. No less, the "returns a value referencing data owned by the current function" message completely fails to explain the problem to me as you did.

I guess the problem for me is that to me this is simply returning a reference to a Node, but the compiler cannot allow such a reference to exist and still be able to modify the Tree, because you could end up with a situation where the Node is removed from the tree and there is a dangling reference. To me these are two separate entities (Node and Tree), and to the compiler Node is part of Tree.
What I don't realize is that when I perform a borrow of a reference to a Node it is simultaneously performing a Tree borrow. Is that so?

Yup, sounds about right. To make them separate things, you can use the same kind of approaches like the ones I’ve listed to avoid the self-referential struct in the first place: Use indices for nodes and just return an index instead of the &'a Node reference; or use Rc<Node> and e.g. have find_node/add_if_missing return a clone of the Rc<Node>. Or the same thing with Arc instead of Rc, if you want support for multi-threaded access.

1 Like

Every time I think I've understood at least a little something, it turns out I haven't. The two codes differ only in the term lifetime, which IMHO in this case has no meaning at all and should not affect anything. However, this difference results in an error.

struct Node<'a> {
    to: &'a u32
}

struct Tree<'a> {
    nodes: Vec<Node<'a>>
}

impl<'a> Tree<'a> {
    fn a(&self) { }
    fn b(&'a mut self) { }
}

fn main() {
    let mut tree = Tree { nodes: vec![] };

    tree.b();
    tree.a();
}

produces:

error[E0502]: cannot borrow `tree` as immutable because it is also borrowed as mutable
  --> src/main.rs:18:5
   |
17 |     tree.b();
   |     -------- mutable borrow occurs here
18 |     tree.a();
   |     ^^^^^^^^
   |     |
   |     immutable borrow occurs here
   |     mutable borrow later used here

For more information about this error, try `rustc --explain E0502`.

and:

struct Node<'a> {
    to: &'a u32
}

struct Tree<'a> {
    nodes: Vec<Node<'a>>
}

impl<'a> Tree<'a> {
    fn a(&self) { }
    fn b(&mut self) { }
}

fn main() {
    let mut tree = Tree { nodes: vec![] };

    tree.b();
    tree.a();
}

Compiles fine.
So I don't know how much of the previous mistakes were due to a misunderstanding of the ownership and how much was due to lifetime. In both cases, the last use of mutable borrow comes before immutable borrow, so it should compile, but it doesn't.

fn b(&'a mut self) is short for fn b(self: &'a mut Self), and on an impl Tree<'a> the type Self stands for Tree<'a>.

Hence fn b(&'a mut self) stands for self: &'a mut Tree<'a>. This means that the Tree<'a> gets borrowed mutably for the entire lifetime 'a. The value of type Tree<'a> however can only exist for at most as long as the lifetime 'a (since it can contain references &'a u32 with that lifetime). In effect a method like b thus requires you to borrow the Tree value for its entire existence, something that you can only do once.

All in all, any usage of a type like &'a mut Tree<'a>, where the target type of a mutable reference contain a mention of the same lifetime as the lifetime of that mutable reference itself, is a huge anti-pattern. Probably never what you want.


OTOH fn b(&mut self) contains an elided lifetime, pretty much it’s the same as fn b<'elided>(&'elided mut self). With the whole self thing made explicit, the signature thus is fn b<'elided>(self: &'elided mut Tree<'a>). Note how the 'elided lifetime is not the same lifetime as the 'a inside of that Tree<'a>. Thus there’s no stupid constraint that the Tree<'a> would need to be borrowed for its entire remaining existence in order to call the method like this.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.