Function that creates and returns two structs with one referencing the other

Something I haven't seen mentioned yet: the actual reason why you are not allowed to return the two structs as separate entities.
When you return them, the calling function becomes the new Owner, and can decide, for example, to drop only one of them. There is no way to enforce (or express) the contract of dropping either both of them or none of them, so rustc conservatively assumes that it could happen, so doesn't allow it.

As @vitalyd mentions, this is a case of "self referential structs", which are known-hard with Rust's safety guarantees. ("hard", not impossible, :wink:). Rust tries to statically account for all the possible edge cases, which is hard. It "works" in C because it (probably) just segfaults if you run into them :yum:

In this case, you may be able to write a workaround by tying all lifetimes to the overarching lifetime of the entire model. (Effectively saying: "this is part of the 'model Model, any Node or Element lives as long as the entire thing").
I'm not sure if this is workable in your situation. It would add a lot of lifetime annotations to your whole API, and probably become very hard to do mutation.

1 Like

OK, quick follow-up I hacked up some (probably pretty ugly) code to compare the solutions using a.) smart pointers and b.) raw pointers. Here are my results:

A couple of things to note first:

  • Might be pretty ugly code
  • I'm certain there is a better way to do this
  • I didn't include Edge
  • I compiled and compared the --release versions
  • This "benchmark" might not tell the whole truth
  • I know that calc_area doesn't calculate the area, but since all node vectors are linearly dependent the area would always be zero => boring :wink:

Version a.) Smart-Pointers

use std::rc::Rc;

#[derive(Debug)]
struct Node {
    x: f64,
    y: f64
}

impl Node {
    fn distance_to_origin(&self) -> f64 {
        (self.x * self.x + self.y * self.y).sqrt()
    }
}


struct Element {
    nodes: (Rc<Node>, Rc<Node>, Rc<Node>)
}

struct Mesh {
    nodes: Vec<Rc<Node>>,
    elements: Vec<Element>
}

impl Mesh {
    fn load_mesh() -> Mesh {
        let n_nodes = 10000000;
        let mut nodes = Vec::with_capacity(n_nodes);

        for val in 0..n_nodes {
            nodes.push(
                Rc::new(Node {x: val as f64 / 10000.0, y: val as f64 / 1000.0})
            );
        }

        let mut elements = Vec::with_capacity(n_nodes);
        for ((n1, n2), n3) in nodes.iter().zip(nodes.iter().skip(1)).zip(nodes.iter().skip(2)) {
            elements.push(
                Element {nodes: (Rc::clone(n1), Rc::clone(n2), Rc::clone(n3))}
            );
        }

        Mesh {nodes, elements}
    }

    fn calc_area(&self) -> Vec<f64> {
        self.elements.iter()
            .map(|element| {
                element.nodes.0.distance_to_origin() +
                    element.nodes.1.distance_to_origin() +
                    element.nodes.2.distance_to_origin()
            })
            .collect()
    }
}

fn main() {
    for i in 1..40 {
        let mesh = Mesh::load_mesh();

        let b = mesh.calc_area();

        println!("{:?}", b[i]);
    }

}

Version b.) Raw-Pointers

#[derive(Debug)]
struct Node {
    x: f64,
    y: f64
}

impl Node {
    fn distance_to_origin(&self) -> f64 {
        (self.x * self.x + self.y * self.y).sqrt()
    }
}


struct Element {
    nodes: (*const Node, *const Node, *const Node)
}

struct Mesh {
    nodes: Vec<Node>,
    elements: Vec<Element>
}

impl Mesh {
    fn load_mesh() -> Mesh {
        let n_nodes = 10000000;
        let mut nodes = Vec::with_capacity(n_nodes);

        for val in 0..n_nodes {
            nodes.push(
                Node {x: val as f64 / 10000.0, y: val as f64 / 1000.0}
            );
        }

        let mut elements = Vec::with_capacity(n_nodes);
        for ((n1, n2), n3) in nodes.iter().zip(nodes.iter().skip(1)).zip(nodes.iter().skip(2)) {
            elements.push(
                Element {nodes: (n1, n2, n3)}
            );
        }

        Mesh {nodes, elements}
    }

    fn calc_area(&self) -> Vec<f64> {
            self.elements.iter()
                .map(|element| {
                    (*element.nodes.0).distance_to_origin() +
                        (*element.nodes.1).distance_to_origin() +
                        (*element.nodes.2).distance_to_origin()
                })
                .collect()
        }
}

fn main() {
    for i in 1..40 {
        let mesh = Mesh::load_mesh();

        let b = mesh.calc_area();

        println!("{:?}", b[i]);
    }
}

Results:

Version a.)

real    0m31.885s
user    0m25.936s
sys     0m5.944s

Version b.)

real    0m10.835s
user    0m7.194s
sys     0m3.639s

So basically as expected, and I'm quite happy with the result. Since it's really easy to do the unsafe stuff I think I'm going with it.

Some additional things I found out:

  • Using Vec::with_capacity instead of Vec::new has had a huge impact on performance!
  • Using Vec::get_unchecked didn't have an impact at all compared to using iterators (which I expected) BUT also using the "bracket operator" which is strange. Maybe the compiler did optimize that part so that it omits the bound check?

So in summary I think I'll go with the unsafe version. I hope you found this "benchmark" interesting though its maybe not so exciting and basically as expected :wink:

The issue is even more fundamental than this - the reference is (or would be, rather) invalid right after the value is moved into the caller (or anywhere for that matter).

1 Like

How much of the time is spent in setup and tear down though? Iā€™d time calc_area only. The current benchmark has all the heap allocation and deallocation of the Rc included (whereas the Vec is one allocation and one deallocation with values copied into it) which I donā€™t think would be the type of operations youā€™d be concerned with?

1 Like

The issue is even more fundamental than this - the reference is (or would be, rather) invalid right after the value is moved into the caller (or anywhere for that matter).

Yeah, to relate it back to the original example, the Node moves from the generate_element stack frame to the main stack frame, but the Element is left pointing to the dead value in the generate_element stack frame.

1 Like

@vitalyd @quadrupleslap: thanks for improving my understanding, seems I still have some lifetime-learning to do! (haha, a "lifetime" of learning :stuck_out_tongue: )


edit: to also contribute something to the original topic:

I hadn't seen this trick before, so I tried it in the playground; is it intentional that it advances with overlap?

1,2,3
2,3,4
3,4,5
4,5,6

This way, you seem to be iterating over each corner between each overlapping triplet of nodes, I would have expected that you'd want to iterate over each triangle, i.e.

1,2,3
4,5,6
7,8,9

Which was your intention?
If you want the latter, have a look at the (experimental) Iterator::step_by
Also, to zip multiple things without the ((n1,n2),n3) nesting, the itertools crate provides izip!, which does exactly what you need for arbitrary numbers of zips.


yup! It does all the allocation in one chunk up-front, instead of doing the incremental-doubling-dance repeatedly every few dozen inserts (exponential doubling algorithm in RawVec)
Have had some great speedups with this myself :slight_smile:

Yup, the compiler is really good at optimising out the bounds checks :smiley:

1 Like

You got a point there. However, the allocation is after all still a part of the code, though it's influence on the overall execution time will diminish in the actual code. Never the less the unsafe code using raw pointers still seems to be the better alternative in this case. I mean sure, it might be unsafe, but as I said I basically get the guarantee from the mesher that everything will be ok instead of the compiler, so why not use this "faster" approach? :slight_smile:

It was just an example, but indeed in the real application nodes will "overlap" i.e. elements share nodes, thus the number of elements is not n_elements = 3 * n_nodes .

I was aware of both solutions, however I haven't had time to look at nightly Rust yet (still very very new to Rust) and I wanted to compare / present "vanilla" solutions and thus excluded the izip package. Aside from the ugly syntax I'm confident that my approach didn't result in a huge performance impact, if at all.

It is! I'm really satisfied with the results :slight_smile:

Just for good measure I hacked up the same code in C++, again might not be the cleanest and fastest solution but it would be a solution that I'd come up with my limited C++ skills. I know I could have avoided some type declarations in the initialization lists but I wanted to make it look like the Rust code I already presented here :joy: I also tried to inline for example the distance method, also threw in noexcept etc. but it did have no impact. After all gcc is a nice compiler too :wink:

#include <cmath>
#include <vector>
#include <iostream>

using std::vector;
using std::size_t;
using std::cout;
using std::endl;

struct Node {
 double x;
 double y;

 double distance_to_origin() const {
   return sqrt(x*x + y*y);
 }
};

struct Element {
 const Node * const nodes[3];
};

struct Mesh {
 vector<Node> nodes;
 vector<Element> elements;

 static Mesh load_mesh() {
   const size_t n_nodes = 10000000;

   vector<Node> nodes;
   nodes.reserve(n_nodes);

   for (size_t val = 0; val < n_nodes; ++val) {
     nodes.push_back(Node {val / 10000.0, val / 1000.0});
   }

   vector<Element> elements;
   elements.reserve(n_nodes);

   for (size_t i = 0; i < n_nodes-2; ++i) {
     elements.push_back(Element {{&nodes[i], &nodes[i+1], &nodes[i+2]}});
   }


   return Mesh {move(nodes), move(elements)};
 }

 vector<double> calc_area() const {
   vector<double> areas;
   areas.reserve(elements.size());
   for (const auto &element: elements) {
     areas.push_back(
                     element.nodes[0]->distance_to_origin() +
                     element.nodes[1]->distance_to_origin() +
                     element.nodes[2]->distance_to_origin()
                     );
   }

   return areas;
 }
};

int main(int argc, char *argv[]) {
 for (size_t i = 1; i < 40; ++i) {
   Mesh mesh = Mesh::load_mesh();

   auto b = mesh.calc_area();

   cout << b[i] << endl;
 }

 return 0;
}

with these execution times

real    0m12.004s
user    0m8.476s
sys     0m3.528s

I'm really really happy with the Rust solution. For me it's much more polished, easier to write and on top of that faster!

And sure enough while coding this I ran into some Segmentation Faults :joy: I forgot to move the values into the Mesh.

This led to a question though. In C++ I have basically no guarantee that data gets moved when I'm using move since it's just a rvalue cast and the structure I'm dealing with might not have a suitable move constructor or move assignment operation defined. For vectors however I think it's safe to assume that everything will work fine. But whats with Rust? Does Rust guarantee that the data within a struct will actually move (i.e. data allocated on the heap)? I mean the Rust documentation basically just says that the ownership will "move", for me this does not necessary mean that the data will also move (not regarding Copy types here). I guess this can be reduced to the question of how smart pointers will behave on move?

Let's say I have a structure that owns or indirectly owns data on the heap for example a Vec. Do I get the guarantee that when I move this struct raw pointers into data of the "old" struct are still valid for the new-moved-to struct?

Might be a confusing question I admit :wink:

Anyway, I'd like to thank you all for the nice discussion so far!

1 Like

I believe this is what is known as a triangle strip

1 Like

Sure, I have no reason for using vanilla other than being lazy :slight_smile: I'm a big fan of the way Rust deals with libraries resp. crates. Especially with a little bit of C++ background :joy:

This is supposed to be all internal stuff not really accessible to the API. In fact I think I'll go with Rc first and use raw pointers later on when I really need the extra bit of performance :slight_smile: Since it's not part of the API it could be refactored quite easily :slight_smile:

I do, learned a lot in this thread already :slight_smile:

1 Like

A move in Rust can be thought of as a memcpy of the bits comprising the value being moved (leaving compiler optimizations aside). In fact, a move vs a Copy type can be thought of as performing the same operation - the difference is the compiler will not allow you to touch the moved-from value, whereas it's fine to continue using the copied-from value.

Taking Vec as an example, you can assume that the moved-to Vec is pointing to the same heap allocation as the moved-from Vec (a memcpy of the Vec itself is moving 3 words: heap ptr, capacity, len). So if you just move them, raw pointers to the heap storage should stay valid. But, things will go south real quick if the Vec reallocs its heap storage (say to accommodate an increase in occupancy). One way to account for that is to add an additional layer of indirection - Vec<Box<T>>. This will move the boxes around, but each Box continues to point to the same heap addr; a raw ptr pointing to the same location will be valid unless the Box is destroyed.

Vec comes with its own guarantees: Vec in std::vec - Rust

Not sure if this answers your question. There's been some discussions of having a marker trait to denote types that maintain a "stable address", but no such thing exists in std today (crates like rental define their own such concept).

2 Likes

Ah yes, trying to save on dependencies. I'd say itertools is very "vanilla", even if it is technically an external dependency. Cargo's excellence really blurs the line between "libstd" and "external crate", and this is intentional!
No-nightly is very understandable, no criticism there :slight_smile:

I believe this is what Vitaly means; by excluding the allocation, which will be fairly consistent between "real" and benchmark, you magnify the effect of the thing you are optimising, thus getting clearer results.

This is a perfectly fine justification for unsafe, just make sure you document this assumption, and maybe throw in a few tests and/or asserts, to make it explicit :wink:

Listen to Vitaly, he knows what he's talking about :slight_smile:

Yeah - I wanted to see what the "access" penalty is (i.e. potential increase in cache misses) for heavy Rc usage. I presume the real application would run the setup, but then spend the rest of its time doing operations over the data (i.e. access) and that's the part that would be important to keep as efficient as possible. But, if setup and teardown is important for the app, then measuring that seems right.

Ha - only sometimes (and possibly never if you ask my wife!) :slight_smile:

Move semantics just mean that ownership of an object moves from one function to another. It does not mean that values are moved around in memory. In contrast to C++, the Rust compiler guarantees that a variable is not accessed after it was moved. So in a sense it is not important that the moved from object is cleared as there is no way to access its content.

Smart pointers behave as expected.

fn receiver(p: Box<i8>) {
  // heap memorypointed to by p is the same memory pointed to by v
  println!("{}", p);
}  // heap memory pointed to by p gets deallocated here.

fn sender() {
  let v = Box::new(5);
  receiver(v);
  // can not access v as it was moved to receiver
} // nothing happens even though v exits scope
1 Like