Problem with understanding lifetimes and references

Hello,

I am learning Rust and wanted to implement a very simple graph like structure. My idea is to have a Graph-struct which contains a list of all existing nodes and a Node-struct with metadata and an adjacency list of references to the connected nodes. The Graph-struct should own the Nodes in this scenario.

Here is a small example of what I am trying to do:

use std::fmt::Debug;

pub struct Node<'a> {
  name: String,
  weight: i64,
  siblings: Vec<&'a Node<'a>>
}

impl<'a> Node<'a> {
  pub fn new (name: String, weight: i64) -> Node<'a> {
    Node {
      name: name,
      weight: weight,
      siblings: Vec::new()
    }
  }

  pub fn add_sibling (&mut self, Node: &'a Node) {
    match self.siblings.iter().find(|x| x.name == Node.name) {
      Some(_) => (),
      None => self.siblings.push(Node)
    };
  }
}

impl<'a> Debug for Node<'a> {
  fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
    write!(f, "({}): {}", self.name, self.siblings
      .iter()
      .map(|w| format!("\t({})", w.name))
      .collect::<Vec<String>>()
      .join(" -> "))
  }
}

pub struct Graph<'a> {
  nodes: Vec<Box<Node<'a>>>
}

impl<'a> Graph<'a> {
  pub fn new () -> Graph<'a> {
    Graph {
      nodes: Vec::<Box<Node>>::new()
    }
  }

  pub fn get_mut (&mut self, name: String) -> Option<&mut Node<'a>> {
    match self.nodes.iter().position(|o| o.name == name) {
      Some(i) => Some(&mut self.nodes[i]),
      None => None
    }
  }

  pub fn get_or_create (&mut self, name: String) -> &mut Node<'a> {
    let res = self.get_mut(name.clone());
    match res {
      Some(mut o) => return &mut o,
      None => {
      }
    };

    let mut o = Node::new(name.clone(), 0);
    self.nodes.push(Box::new(o));
    &mut o
  }

  pub fn parse (&mut self, string: String) {
    let lines: Vec<Vec<&str>> = string
      .trim()
      .split("\n")
      .map(|line| line.trim().split("->").collect::<Vec<&str>>())
      .collect();

    for os in lines {
      let mut main = self.get_or_create(os[0].to_string());
      let sub = self.get_or_create(os[1].to_string());
      main.add_sibling(sub);
    }
  }

  pub fn nodes_ref (& self) -> &Vec<Box<Node>> {
    & self.nodes
  }
}

const NODES: &str = "A->B
B->C
C->D
B->F
F->G
D->H";

fn main () {
  let mut graph = Graph::new();
  graph.parse(NODES.to_string());
  println!("{:?}", graph.nodes_ref());
}

As you can see I already tried fiddling around with lifetimes and put the Nodes in the nodes list into a Box. The problem where I got stuck is now the get_or_create function, which I want to use to ensure that a node is created. I guess there are more problems...

I am aware that there are maybe better solutions to the graph implementation, like using a HashMap, but I found this very interessting because I have no idea how to implement this in rust, what would be very easy in for example c++. I hope to understand the concepts of Rust better through getting this example to work.

Thanks for your help and explanations!

A reference in Rust is not the same as a C++ pointer. While a reference to an object exists, that object is held under a compile-time lock. A &mut reference is a unique lock, while a & reference is a shared lock. Fundamentally this means you cannot create self-referential types without reference counting.

Note that this type signature

pub struct Graph<'a> {
    nodes: Vec<Box<Node<'a>>>
}

does not do what you want. The 'a is a generic parameter. The user of the graph chooses the value of parameters, so I can just choose to do this:

let graph = Graph::<'static>::new();

This means that the nodes now have type Node<'static>, so siblings now contains values of type Vec<&'static Node<'static>>. Note that 'static means that a reference must be valid for the entire duration of the program. I.e. it's undefined behaviour for a 'static reference to ever become invalid.

This is why you can't create new siblings: I might have chosen the lifetime 'static, and your nodes don't live forever (they are destroyed when the graph is dropped). This is why the compile error contains this snippet:

   = note: expected  `&mut Graph<'_>`
              found  `&mut Graph<'a>`

The node you created is not (necessarily) alive for the entire lifetime 'a, but for some shorter lifetime it elided into '_. In this case the '_ comes from the &mut Node<'a> returned by get_or_create, whereas Node requires siblings to have the type &'a Node<'a>, which also has 'a on the reference itself.

Be aware that if you could return a &'a mut Node<'a>, then if 'a = 'static, you have a &'static mut Node<'static>, and since mutable references are unique locks, you can now never access this node ever again through anything but this reference.

You may find this article interesting.

Thank you for your reply.

How can I get around these problems? I have to declare a generic lifetime to Node and this bubbles to Graph. I want the nodes only life as long as Graph lives, the ownage should be transfered to Graph as soon as a Node is added.

Further I think I have to give mutable access to the nodes themselves so that internal information can be changed from outside of Graph.

When I understand correctly, I could use Rc instead of Box in the Graph to wrap Node. But then still a 'static lifetime could be created what would lead to increasing reference counters?

Do I really have to supply all these generic lifetimes or is there a way around?

Don't use ordinary references at all. I recommend using Rc instead. Note that since you now have shared access to the nodes, you need to prove to the compiler that you are not causing data races, which you do using either one of the cells (e.g. Cell or RefCell) which makes the compiler ensure your graph is only used from a single thread at once, or by using a Mutex if you need several threads to modify at the same time.

Additionally keep in mind that reference cycles cause leaks, so consider breaking them with Weak.

Alternatively a common approach is to use indexes into the vector in the graph instead of pointers.

1 Like

Thanks for pointing out RefCell. I think I got it now. Would you be so kind and take a look over the working code?

Furthermore I am wondering when I should use this kind of nested pointers? It looks a bit like bad design to me?

use std::fmt::Debug;
use std::rc::Rc;
use std::cell::RefCell;

pub struct Node {
  name: String,
  siblings: Vec<Rc<RefCell<Node>>>
}

impl Node {
  pub fn new (name: String) -> Node {
    Node {
      name: name,
      siblings: Vec::new()
    }
  }

  pub fn add_sibling (&mut self, node: Rc<RefCell<Node>>) {
    match self.siblings.iter().find(|x| x.borrow().name == node.borrow().name) {
      Some(_) => (),
      None => self.siblings.push(node)
    };
  }
}

impl Debug for Node {
  fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
    write!(f, "({}): {}", self.name, self.siblings
      .iter()
      .map(|w| format!("({})", w.borrow().name))
      .collect::<Vec<String>>()
      .join(" -> "))
  }
}

pub struct Graph {
  nodes: Vec<Rc<RefCell<Node>>>
}

impl Debug for Graph {
  fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
    write!(f, "{}", self.nodes
      .iter()
      .map(|n| format!("{:?}", n.borrow()))
      .collect::<Vec<String>>()
      .join("\n"))
  }
}

impl Graph {
  pub fn new () -> Graph {
    Graph {
      nodes: Vec::<Rc<RefCell<Node>>>::new()
    }
  }

  pub fn get_mut (&mut self, name: String) -> Option<Rc<RefCell<Node>>> {
    match self.nodes.iter().position(|o| o.borrow().name == name) {
      Some(i) => Some(self.nodes[i].clone()),
      None => None
    }
  }

  pub fn get_or_create (&mut self, name: String) -> Rc<RefCell<Node>> {
    let res = self.get_mut(name.clone());
    match res {
      Some(o) => return o,
      None => {
      }
    };

    let o = Rc::new(RefCell::new(Node::new(name.clone())));
    self.nodes.push(o.clone());
    o
  }

  pub fn parse (&mut self, string: String) {
    let lines: Vec<Vec<&str>> = string
      .trim()
      .split("\n")
      .map(|line| line.trim().split("->").collect::<Vec<&str>>())
      .collect();

    for os in lines {
      let main = self.get_or_create(os[0].to_string());
      let sub = self.get_or_create(os[1].to_string());
      main.borrow_mut().add_sibling(sub.clone());
    }
  }

  pub fn nodes_ref (& self) -> &Vec<Rc<RefCell<Node>>> {
    & self.nodes
  }
}

const NODES: &str = "A->B
B->C
C->D
B->F
F->G
D->H";

fn main () {
  let mut graph = Graph::new();
  graph.parse(NODES.to_string());
  println!("{:?}", graph);
  match graph.get_mut("C".to_string()) {
    Some(n) => n.borrow_mut().name = "changed".to_string(),
    None => ()
  };
  println!("---\n{:?}", graph);
}

It looks OK, but note that you now have a memory leak, which you can fix by either:

  1. Use Weak to break cycles of reference counted pointers.
  2. Clear the siblings vector of every node in the destructor of Graph.

When writing struct Graph<'a>, you actually don't know what 'a is (it's not necessarily the lifetime of structure itself!). Usually, lifetime as struct's generic parameter means that the struct is borrowed from something else, and it's lifetime is inferred when borrowed. Take Vec::iter() as example:

impl <T> Vec<T> {
    pub fn iter<'a>(&'a self) -> Iter<'a, T> {
        // implementation
    }
}

fn main() {   
   let v = vec![0, 0, 0, 0];
   let iter = v.iter();
   drop(v); // <- compile error because `iter` still borrowed!
   for i in iter {
       println!("{}", i)
   }
}

The lifetime parameter of Iter is the same as the vector, which makes sense, since we do borrow an iterator from that vector. However, your Graph structure is created from thin air. I found it mostly wrong to do so, though it sometime do compile.

Practically, I will write the structure as:

use std::rc::Rc;
pub struct Node {
  name: String,
  weight: i64,
  siblings: Vec<Rc<Node>>
}
pub struct Graph {
  nodes: Vec<Rc<Node>>
}
impl Drop for Graph {
   fn drop(&mut self) {
      // clear the siblings for each node to prevent memory leak
   }
}
// or if performance is concerned, go to raw pointers
pub struct Node {
  name: String,
  weight: i64,
  // Make sure you won't access siblings after the Graph is dropped!
  siblings: Vec<*const Node>
}
pub struct Graph {
  nodes: Vec<Box<Node>>
}

I would not go for raw pointers here. It would make creating a mutable reference to a node very dangerous, as the compiler assumes mutable references have unique access to the node, but the compiler won't prevent you from accessing the node through one of your raw pointers while the mutable reference exists, despite that being undefined behavior. I'm not quite sure, but the mere existence of these extra ways to access the node may be enough to cause UB.

Essentially you would have to do all access to nodes through raw pointers to avoid the aliasing requirement that references have. For example storing a box in the vector is not a good approach, instead you should store the pointer returned by Box::into_raw in the vector, and you would have to manually drop them.

It can be done, but you have to be aware of the extra requirements that Rust puts on references compared to C++

Hm, when I use Weak in Node, I cannot access the name in add_sibling without upgrading Weak to Rc and complicating the check for existing nodes.

So I guess implementing Drop for Graph would be the right way or am I missing something?

You aren't missing something, no. The destructor makes the rest of the code simpler.

I mean, even if you keep a node around, it's siblings vector will still be cleared.

Do I have to use drain(..) in the deconstructor or is there another way? When I just call clear to the Vecs drop doesn't get called for the Nodes.

Can you post the destructor?

Yes, sorry.


impl Drop for Node {
  fn drop(&mut self) {
    for sibling in self.siblings.drain(..) {
      drop(sibling);
    }
  }
}

impl Drop for Graph {
  fn drop(&mut self) {
    for node in self.nodes.drain(..) {
      drop(node);
    }
  }
}

The node variable in your Graph destructor is not of type Node. It's of type Rc<RefCell<Node>>, so your drop only decrements the reference count by one, thus it does not enter the destructor of the node itself if some other nodes keep it alive. You want something like this:

impl Drop for Graph {
  fn drop(&mut self) {
    for node in self.nodes.drain(..) {
      node.borrow_mut().siblings.clear();
    }
  }
}

Thank you very much for your time and help. I render my misunderstanding as solved.