I'm trying to learn Rust and ran across the following issue when doing a practice problem.
fn add_edge() fails to compile with the following error:
error: lifetime may not live long enough
--> src\bin\test.rs:26:9
|
16 | impl<'b> Graph<'b> {
| -- lifetime `'b` defined here
...
24 | fn add_edge<'c>(&'c mut self, from: usize, to: usize, metadata: EdgeMetadata) {
| -- lifetime `'c` defined here
25 | self.edges_metadata.push(metadata);
26 | self.node_map[from].push((to, self.edges_metadata.last().unwrap()));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'c` must outlive `'b`
|
= help: consider adding the following bound: `'c: 'b`
How is add_edge related to the lifetime 'b if the only input to add_edge is an object with lifetime 'c?
Further, what is a performant and idiomatic fix to this problem? Unfortunately adding the bound 'c: 'b as suggested leads to another error since now (afaik) each iteration of the for loop creates a mutable reference of lifetime 'b which creates multiple simultaneous mutable references.
use std::io;
type NodeIndex = usize;
struct EdgeMetadata {
cost: u32,
length: u32,
}
struct Graph<'a> {
// vector where element i contains a list of edges i -> j. Each edge is a tuple (j, metadata).
node_map: Vec<Vec<(NodeIndex, &'a EdgeMetadata)>>,
edges_metadata: Vec<EdgeMetadata>,
}
impl<'b> Graph<'b> {
fn new(num_nodes: usize) -> Graph<'b> {
Graph {
node_map: vec![Vec::new(); num_nodes],
edges_metadata: Vec::new(),
}
}
fn add_edge<'c>(&'c mut self, from: usize, to: usize, metadata: EdgeMetadata) {
self.edges_metadata.push(metadata);
self.node_map[from].push((to, self.edges_metadata.last().unwrap()));
self.node_map[to].push((from, self.edges_metadata.last().unwrap()));
}
}
/**
Reads a graph from stdin and parses it into the Graph data structure.
The stdin format is specified here: https://dmoj.ca/problem/ccc23s4
*/
pub fn main() {
let stdin = io::stdin();
let mut buffer = String::new();
stdin.read_line(&mut buffer).expect("Failed to read buffer");
let mut buffer_iter = buffer.split_whitespace();
let num_nodes: usize = buffer_iter.next().unwrap().parse().unwrap();
let num_edges: usize = buffer_iter.next().unwrap().parse().unwrap();
let mut graph = Graph::new(num_nodes);
for _ in 0..num_edges {
let mut buffer = String::new();
stdin.read_line(&mut buffer).unwrap();
let mut buffer_iter = buffer.split_whitespace();
graph.add_edge(
buffer_iter.next().unwrap().parse().unwrap(),
buffer_iter.next().unwrap().parse().unwrap(),
EdgeMetadata {
length: buffer_iter.next().unwrap().parse().unwrap(),
cost: buffer_iter.next().unwrap().parse().unwrap(),
},
);
}
}
Thank you for the advice! I have updated the code to remove the self-references in Graph however I still get the same compile error. I would like to avoid simply storing the indices since (arguably) it seems "better" to store a direct reference rather than a number that could become outdated.
use std::io;
type NodeIndex = usize;
struct EdgeMetadata {
cost: u32,
length: u32,
}
struct Graph<'a> {
// vector where element i contains a list of edges i -> j. Each edge is a tuple (j, metadata).
node_map: Vec<Vec<(NodeIndex, &'a EdgeMetadata)>>
}
impl<'b> Graph<'b> {
fn new(num_nodes: usize) -> Graph<'b> {
Graph {
node_map: vec![Vec::new(); num_nodes]
}
}
fn add_edge<'c>(&'c mut self, from: usize, to: usize, metadata: &'c EdgeMetadata) {
self.node_map[from].push((to, metadata));
self.node_map[to].push((from, metadata));
}
}
/**
Reads a graph from stdin and parses it into the Graph data structure.
The stdin format is specified here: https://dmoj.ca/problem/ccc23s4
*/
pub fn main() {
let stdin = io::stdin();
let mut buffer = String::new();
stdin.read_line(&mut buffer).expect("Failed to read buffer");
let mut buffer_iter = buffer.split_whitespace();
let num_nodes: usize = buffer_iter.next().unwrap().parse().unwrap();
let num_edges: usize = buffer_iter.next().unwrap().parse().unwrap();
let mut graph = Graph::new(num_nodes);
let mut edges_metadata: Vec<EdgeMetadata> = Vec::new();
for _ in 0..num_edges {
let mut buffer = String::new();
stdin.read_line(&mut buffer).unwrap();
let mut buffer_iter = buffer.split_whitespace();
edges_metadata.push(EdgeMetadata {
length: buffer_iter.next().unwrap().parse().unwrap(),
cost: buffer_iter.next().unwrap().parse().unwrap(),
});
graph.add_edge(
buffer_iter.next().unwrap().parse().unwrap(),
buffer_iter.next().unwrap().parse().unwrap(),
edges_metadata.last().unwrap()
);
}
}
Direct references can also become "outdated" (==invalid), which is exactly why you can't create self-referential types. If you moved a self-referential type, then the self-reference would still point to the previous (therefore now logically uninitialized) place, so it would become dangling. This problem is intrinsic and can't be solved in the way you think you can.
Indices at least result in a failed bounds check and a reliable panic instead of UB when you screw up.
Your new code still has two problems:
The lifetime annotations on add_edge are completely bogus. You shouldn't randomly add lifetime annotations, that will not solve your problem. You should think logically about what each lifetime parameter means and how you want them to relate to each other.
Your data structure Graph<'a> contains references with lifetimes &'a, i.e., &'a EdgeMetadata. Thus, in the impl<'b> Graph<'b> (I don't get why you renamed the lifetime, btw), you should expect the lifetime of the argument to be pushed to be the same, i.e., 'b. It therefore doesn't make any sense to require the reference to the EdgeMetadata being added to live exactly as long as self (which is what the current annotations suggest).
If you fix that, then a compiler error still remains, which is exactly the same kind of reference invalidation problem, and Rust protects you from it. If you store a reference to the last() element of a vector, then that vector can't be modified (e.g., pushed to) until the stored reference is around, because that could cause the vector to reallocate, which would again cause the previously-created references to become dangling.
With your feedback I have updated the code and it now works. I'm curious to know what you think of the new approach. It is more verbose but only uses indices for initializing the data structure. In my mind this seems advantages since a) there is no risk of unexpected panics from bound checks in later steps, b) later code will be more readable since I can do node_map[i][j].1.cost instead of edges[node_map[i][j].1].cost.
Implementing the Clone trait is a great solution for any practical application, especially given how small EdgeMetadata is. However, in the spirit of the coding contest question, I am trying as much as possible to find solutions that required the least possible memory (and I'm also curious about what would be done in a hypothetical case with a very large EdgeMetadata struct).
use std::io;
type NodeIndex = usize;
struct EdgeMetadata {
cost: u32,
length: u32,
}
struct Graph<'a> {
// vector where element i contains a list of edges i -> j. Each edge is a tuple (j, metadata).
node_map: Vec<Vec<(NodeIndex, &'a EdgeMetadata)>>,
}
impl<'a> Graph<'a> {
fn new(num_nodes: usize) -> Graph<'a> {
Graph {
node_map: vec![Vec::new(); num_nodes],
}
}
fn add_edge(&mut self, from: usize, to: usize, metadata: &'a EdgeMetadata) {
self.node_map[from].push((to, metadata));
self.node_map[to].push((from, metadata));
}
}
/**
Reads a graph from stdin and parses it into the Graph data structure.
The stdin format is specified here: https://dmoj.ca/problem/ccc23s4
*/
pub fn main() {
let stdin = io::stdin();
let mut buffer = String::new();
stdin.read_line(&mut buffer).expect("Failed to read buffer");
let mut buffer_iter = buffer.split_whitespace();
let num_nodes: usize = buffer_iter.next().unwrap().parse().unwrap();
let num_edges: usize = buffer_iter.next().unwrap().parse().unwrap();
let mut edges_metadata: Vec<EdgeMetadata> = Vec::with_capacity(num_edges);
let mut edges: Vec<(usize, usize)> = Vec::with_capacity(num_edges);
for _ in 0..num_edges {
let mut buffer = String::new();
stdin.read_line(&mut buffer).unwrap();
let mut buffer_iter = buffer.split_whitespace();
edges.push((
buffer_iter.next().unwrap().parse().unwrap(),
buffer_iter.next().unwrap().parse().unwrap(),
));
edges_metadata.push(EdgeMetadata {
length: buffer_iter.next().unwrap().parse().unwrap(),
cost: buffer_iter.next().unwrap().parse().unwrap(),
});
}
let mut graph = Graph::new(num_nodes);
for (i, (from, to)) in edges.iter().enumerate() {
graph.add_edge(*from, *to, &edges_metadata[i]);
}
drop(edges);
}
The contest specifies that no inputs will be larger than 2000 edges. Considering an edge-only representation (example), that's 32 kiB. You are going to have more memory overhead from standard I/O's internally-allocated buffers, so in this case it's not really worth worrying about.
For read-only data, everything can be made cheaply cloneable by using reference counting (Rc or Arc, depending on whether you need thread-safety).
You are still not using indices in a way I intended it. I meant use indices for representing edges in the data structure. What you are doing is you are still using references, but you are iterating over the vector using indices. Don't do that; use zip instead.
This is good advice and thank you for the tip about zip. I also saw you didn't some cool stuff with the thiserror crate. I don't think programming contests allow external creates but I would use it otherwise.
I guess ultimately what surprises me is it doesn't seem possible to create a structure (e.g. Graph) where one object (e.g. Node Map) has a reference to another one of the structure's object (Edges). See graph below.
Graph
/ |
/ |
↓ |
Node Map |
\ |
\ |
↓ ↓
Edges
This is surprising since as shown in the diagram there aren't any reference loops, yet this is still considered self-referencing? I understand that one can't have multiple mutable references, however if edges is read-only what's the fundamental limitation on creating such a struct?
Thinking about it, such a struct contains a permanent borrow of Edges (since Node Map contains references to Edges). Afaik, this means it's impossible to move Edges or Graph. Is there no way to make the compiler realize that Graph is the owner of both Edges and the borrowed reference to Edges and as such it should be allowed to moveGraph?
There absolutely is a loop; your diagram is inaccurate. Since a field is contained in a struct, if you move the struct, you move the field too (and if you have a reference to the field, you have a reference pointing inside its containing struct). Thus, if you move the struct, you can't have stuff pointing to any of its fields. You can't just pretend structs and fields are independent of each other.
There's nothing for the compiler to realize here, because the above reasoning is wrong. Again, if you move the graph, then any references that point inside it are going to be invalidated. It doesn't matter who owns those references, that's a red herring. A move moves the value to a new location, which means that its memory address (and consequently, addresses of all of its fields) will be different. Thus, pointers that would still point to the old place would be invalid.
If you want self-referential structures, you can only usefully achieve it in safe code via Rc or Arc and their Weak counterparts, but it's generally considered an anti-pattern.
Aha thank you so much! Thinking of the struct fields as all being part of the same object was the trick. Great! I'll use indices and learn more about Rc or Arc.