Function that creates and returns two structs with one referencing the other

Hi,

it's my first post in this forum and I'm also very new to Rust. While fiddling around to become familiar with the language I've come across a lifetime problem that I can't get my head around.

The program is very simple. I have a Node and and Element struct, the Element is supposed to store a reference to a Node. I also have a generator element which creates a Node and and Element (which references the created node) and returns (moves) both in a tuple.

struct Node {
    x: f64,
    y: f64
}

struct Element<'a> {
    node: &'a Node
}

fn generate_element() -> (Element, Node) {
    let n = Node { x: 10.0, y: 42.0 };
    (Element { node: &n }, n)
}

fn main() {

    let (n, e) = generate_element();

}

However, when I try to compile this I get the following error message:

error[E0106]: missing lifetime specifier
  --> src/main.rs:10:27
   |
10 | fn generate_element() -> (Element, Node) {
   |                           ^^^^^^^ expected lifetime parameter
   |
   = help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
   = help: consider giving it a 'static lifetime

If I try to add a lifetime parameter

fn generate_element<'a>() -> (Element<'a>, Node) {
    let n = Node { x: 10.0, y: 42.0 };
    (Element { node: &n }, n)
}

I get the following error

error[E0597]: `n` does not live long enough
  --> src/main.rs:12:23
   |
12 |     (Element { node: &n }, n)
   |                       ^ does not live long enough
13 | }
   | - borrowed value only lives until here
   |
note: borrowed value must be valid for the lifetime 'a as defined on the function body at 10:1...
  --> src/main.rs:10:1
   |
10 | / fn generate_element<'a>() -> (Element<'a>, Node) {
11 | |     let n = Node { x: 10.0, y: 42.0 };
12 | |     (Element { node: &n }, n)
13 | | }
   | |_^

Fair enough, it somehow makes sense for me that I should inform the compiler that the returned Node lives as long as Element, so I tried to annotate Node as well

fn generate_element<'a>() -> (Element<'a>, Node<'a>) {
    let n = Node { x: 10.0, y: 42.0 };
    (Element { node: &n }, n)
}

but then I get

error[E0107]: wrong number of lifetime parameters: expected 0, found 1
  --> src/main.rs:10:44
   |
10 | fn generate_element<'a>() -> (Element<'a>, Node<'a>) {
   |                                            ^^^^^^^^ unexpected lifetime parameter

This is somehow confusing for me. I may not have a good understanding of the lifetime parameters, but for me the lifetime of both the Element and the Node should be the same since I move both of them out of the function. The only thing that might be an issue is the destruction of the tuple when it goes out of scope (after the main function returns), since I believe that Node must outlive the Element. I tried to permute the returned tuple so that Node comes first etc. but it didn't solve the compiler error.

I also tried

fn generate_element<'a, 'b: 'a>() -> (Element<'a>, Node<'b>) {
    let n = Node { x: 10.0, y: 42.0 };
    (Element { node: &n }, n)
}

which, to my understanding, should assure the compiler that Node lives at least as long as Element, but it tells me that Node has an unexpected lifetime parameter.

I'm out of ideas and I would really appreciate if someone could help me with this issue or atleast clarify what the actual issue is here :wink:

A small update, I think I'm starting to understand the issue. I "misinterpreted" the move. The Node will get destroyed after the function returns, which does make sense since it lives on the stack, and only the values of n "move" to the object that it has been assigned to.

So in my second attempt I tried to wrap the Node in a Box so that it gets allocated on the heap and outlives the function. A simple test function confirmed that the Box will point to the same memory location after being moved out of the function.

fn node() -> Box<Node> {
    let n = Box::new(Node {x: 3.0, y: 4.0});

    println!("{:p}", &*n);

    n
}

fn main() {

    let a = node();

    println!("{:p}", &*a);
}

This will print the same memory address. However when using a Box in my generate_element it still won't work

fn generate_element() -> (Box<Node>, Element) {
    let n: Box::new(Node {x: 10.0, y: 42.0 });

    let e = Element { node: &n};

    (n, e)
}

I get the same error messages. I also tried to add a lifetime parameter, still the same behavior as with the first version of this function. I don't know how I can express that the data the Box points to will live as long as the Element. As far as I can tell the element will drop after the function returns and it's values will be moved into the variable that it's being assigned to.

Firstly you don't need to allocate a box in order to return node.

secondly why not create Element in generate_element's caller?

You problem is one of the more advanced with dealing with rust.
When using references your creating a stack of items on top of each other, rust itself does not have the ability to move around prior items in the stack.
rental is once such crate that helps overcome the problem.

3 Likes

If you really need to structure your code this way, you must not use Box but Rc.

Your problem and the solution to it is described in the Rust Book (Rc<T>, the Reference Counted Smart Pointer - The Rust Programming Language)

However I'm not sure what you are trying to achieve with this code? generate_element should take a &Node as a parameter and return an Element containing the node.

1 Like

Example using Rc:

struct Node {
    x: f64,
    y: f64
}

use std::rc::Rc;

struct Element {
    node: Rc<Node>
}


fn generate_element() -> (Rc<Node>, Element) {
    let n = Rc::new(Node { x: 10.0, y: 42.0 });

    let e = Element { node: n.clone()};

    (n, e)
}

fn main() {

    let (n, e) = generate_element();

}
2 Likes

As has been noted by others, this issue is known as self referential structs - it’s not allowed in Rust. If you search for this phrase, you’ll get plenty of reading material :slight_smile:.

The Box doesn’t help because the Box itself is still moved; it points to a stable address on the heap internally, but it itself can move around.

2 Likes

First of all thank you for your replies. I think I need to clarify what I'm actually trying to do here.

Basically I want to parse a Mesh file (e.g. Gmsh) and add an abstraction layer to the elements a mesh is constructed of. For example the meshes that I'm interested in usually consists of Nodes, Edges, Elements / Triangles.

A note is represented by it's coordinates, an edge points to two notes (or stores a reference to them), a triangle has three nodes and 3 edges (or references).

This is all part of a FEM solver which I already coded in Python (for most parts). Within python I'm using the advanced indexing mechanisms of numpy to do get the references right. For example:

nodes_to_xy = np.array([
    [ 0.0, 0.0],
    [ 1.0, 0.0],
    [ 0.0, 1.0],
    [-1.0, 0.0],
])

edges_to_nodes = np.array([
    [0, 1],
    [0, 2],
    [1, 2],
    [0, 3],
    [2, 3]
])

elements_to_nodes = np.array([
    [0, 1, 2],
    [0, 2, 3]
])

elements_to_edges = np.array([
    [0, 1, 2],
    [1, 3, 4]
])

# Get a cube with the coordinates of each node for all elements
val = nodes_to_xy[elements_to_nodes,:]

This for me has proven to be one of the fastest ways in python / numpy to do this kind of stuff, like calculating the area of triangles, or the mapping to the reference triangle etc.. However as you can imagine this can get quite ugly fast. For example suppose you want to get the x-coordinate of the last point within the last edge of each triangle

nodes_to_xy[edges_to_nodes[elements_to_edges[:,2],1],0]

While some might like this, an it certainly is very powerful, I myself find it quite confusing and hard to read. So I wanted to add some abstraction to this, in python however abstracting this, which usually results in stepping a little bit away from "numpy-only" results in a substantial performance hit.

Long story short: I wanted to abstract the relations between the Nodes and Edges and Elements and also the way I calculate stuff. Instead of doing the arithmetic directly with the complete numpy arrays (using the advanced indexing) I want to have a Element with a calc_area() -> f64 function for example. In hope that the zero-cost abstraction holds true and the performance penalty won't be significant :wink:

So in principle I wanted to have something like this:

struct Node {
    x: f64,
    y: f64
}

struct Edge {
    nodes: (&Node, &Node)
}

struct Elements {
    nodes: (&Node, &Node, &Node),
    edges: (&Edge, &Edge),
}

fn load_gmsh(filename: &str) -> (Vec<Node>, Vec<Edge>, Vec<Element>) { ... }

or instead of the load_gmsh function something wrapped into a Mesh struct

struct Mesh {
    nodes: Vec<Node>,
    edges: Vec<Edge>,
    elements: Vec<Element>
}

impl Mesh {
    fn load_gmsh(filename: &str) -> Mesh { ... }
}

I know this won't compile, but you get the idea. I wanted to avoid smart pointers due to the potential performance impact for very large meshes, which I usually have to deal with. But I fear there's no way around this. Especially since the nodes, edges and elements are being constructed at the same time and not after each other. For example read 3 nodes, add them to the nodes list, construct the edges of the 3 nodes push them on the edges list and construct the element and push it on the element list. This isn't exactly the way it works, but again, just so you get the idea.

I used C++ with libraries like armadillo or Eigen and wrapped it into Python modules using Swig in the past. But C++ is like juggling with sharp sword sometimes, and when I heard of Rust I wanted to give it a try. Though even this small example got me intro trouble real fast I still want to get into Rust since I somehow like the language and it's new ways of doing stuff. But as you can tell I'm having some troubles getting my project started. So if you have any idea or recommendations of how I could do the things in a "rusty" way I'm all in! :slight_smile:

Sorry for the long post, I really do appreciate all feedback and thanks again for your fast replies!

1 Like

Yeah, it's not going to work with references (I work in FEM and pondered something like this... if you ever publish code let us know). I don't think reference counts are helpfull for performance reasons, too.

What you can do is use indices. The Mesh owns Vecs of all entities, and the elements own indices into those Vecs. This should work out, but you're left with bounds checks. Write first and benchmark, but you could either hide access behind an API that then uses unchecked indexing (seems like you want to write a rich API anyways), or check out something like https://crates.io/crates/indexing/, but I'm not sure how applicable that really is.

How important is it to have references in elements and meshes? Performance should be much better if you are not using references at all but copy the data.

Apart from that it sounds like you need to work with raw pointers and therefore giving up a lot of compile time checks and guarantees offered by Rust.

#[derive(Debug)]
struct Element {
    node: * mut Node
}

fn generate_element() -> (Element, * mut Node) {
    let n = Box::into_raw(Box::new(Node { x: 10.0, y: 42.0 }));
    (Element { node: n }, n)
}

I'll add, since I haven't seen it mentioned explicitly on this thread, that putting references in data structure is very, very hard in rust, and almost always there wrong thing.

I'm curious what makes you think Rc would perform poorly? That would seem to be the clean, safe, rusty solution here.

1 Like

I think this is really overstating things, particularly the "almost always the wrong thing" part. It's true that dealing with references requires more discipline to arrange lifetimes appropriately, and this is particularly true when holding mutable references. But, your statement is too extreme and might be misconstrued by some, especially Rust beginners.

4 Likes

Oh, nothing special really, other than knowing how large FEM models can get an how often you need to access entities. This also leads to wanting to use concurrent computations, so you're at an Arc rather than an Rc, and I somewhat hazily remember reading about that having quite a bit of an impact.

But note though that this is all a lot of handwaving from someone who doesn't know any better, so if you have the chance of trying and measuring, that would certainly be better :slight_smile:

IMO, the biggest cost of Rc is the memory indirection (and potentially memory bloat if there are many instances of it) - i.e. more cache misses on access. The refcount maintenance should be in the noise unless you're doing virtually nothing but cloning and dropping the Rc. Access to the data is just a memory read.

Arc is the same thing except its refcount management is more costly than Rc.

1 Like

The structs I showed here don't tell the whole truth. In reality there would be some more data stored in those, like boundary conditions, material parameters (or references to a material). In general since this is supposed to be a FEM Solver every byte and micro second counts when you are targeting degrees of freedom in the order of > 10^6. For example I'd love to access memory in a vector for example without bound checks, since I basically get a guarantee from my mesher that everything will be fine and the access only happens inside my routines such that the API itself does not expose direct access to the data.

Maybe Rust is just not the right tool for it? I guess it can be bend to perform the way I want it to be, but if I have to fiddle around with such things it might make more sense to go with Fortran or C++ instead.

Anyway, I'm not giving up. I'll try to get some performance data with smart pointers. I'll let you know once I have them :wink:

1 Like

So in this case I'd say go for it, hide the unsafe unchecked indexing behind the API and you're fine. That's sort of the point of rust, hiding the unsafety behind a (hopefully) low number of abstractions. I'd say rust is exactly the right tool for this, just needs some figuring out the details :slight_smile:

Do you have an example of where a reference would be used (and actually useful) in a data structure, which would be suitable for a beginning programmer? I've never encountered such a thing.

My impression has been that trying to do this is what causes people to give up on rust.

Does "data structure" refer to some specific use case? I can't quite tell if you're using this term to describe some concrete/specific scenario. At its simplest, an iterator would hold a reference to the container it's iterating.

The way I prefer to think of this is purely in terms of ownership and lifetime. Who owns the data? Does ownership need to be shared? Is mutable access required? What is the lifetime of the values being stored?

There are many types that are fully generic - they don't particularly care what you put in them, with Vec being perhaps the classic example. As far as it's concerned, it owns the T values you give it. You can put non-Copy types in there and then it's truly owning the value. You can put Copy types in there, and then it owns a copy. Similarly, you can put immutable references in there and it just owns copies of those references. But in the end, it doesn't really care. There's some special handling in it for DSTs and ensuring it has the right variance, but otherwise it doesn't really care.

So, there's nothing inherently wrong with references. Quite the contrary - users should be encouraged to make use of them when possible and/or appropriate (it may take some Rust experience to spot the possible/appropriate cases). But I think it all boils down to figuring out the ownership story - the rest will kind of fall into place.

I can't speak definitively for what causes people to give up on Rust, but from what I've observed on this forum, it's usually that they jump straight into the deep end of the pool before they have the basics down. In particular, everyone's "beloved" example of implementing a linked list in Rust comes up fairly frequently.

Don't get me wrong - references will (likely) throw someone into the ownership pit pretty quickly, certainly sooner than not using them. But it's not the references themselves really, it's more that ownership needs to be thought about, and part of that thinking may reveal/confirm that references are not the right thing in that particular design/circumstance.

1 Like

That makes sense @vitalyd. I was not thinking of iterators, which are a good example of wanting references in a data structure. Normally I think of "persistent" data structures, which usually want to open their own data, or at have an assurance that their data won't disappear. Data structures holding references require you to explicitly define lifetimes, which is pretty hard, and I at least struggled to find adequate documentation, since most of our documentation is geared towards learning to avoid specifying lifetimes.

1 Like