Strange RVO, or: How do I learn to stop worrying and love `move'?


#1

I’m confused about move semantics. Take this hypothetical code, for instance:

use std::ops::Add;

struct Array([f32;20]);

impl Add<Array> for Array {
    type Output = [f32;20];
    fn add(self, rhs: Array) -> Self::Output {
        let mut v:[f32;20] = [0.0;20];
        for i in 0..20 {
            v[i] = (self.0)[i] + (rhs.0)[i];
        }
        return v;
    }
}

fn main() {
    let a = Array([0.0; 20]);
    let b = Array([1.0; 20]);
    let c = a + b;

   println!("{}", c[2]);
}

After a user in #rust looked at the generated IR (I cannot really read assembly or IR, so I outsourced), he or she came to the conclusion that a temporary variable for the rvalue was being allocated and there was a (needless) copy during the move to c (if anyone wants to verify this and point to me in the assembly what’s actually going on, I would appreciate it). From my understanding of Eigen, a C++ library for linear algebra, one can overload the move-assignment operator to only perform the allocation of the resulting vector once.[1] Now of course, there is no Move trait to overload in Rust, so this behavior can’t be changed.

LLVM does, however, aggressively optimize returns in many cases and performs this RVO frequently. That’s fortuitous and desirable.

So the question is: when can I expect LLVM to do this kind of optimization? When can I not?

[1] http://eigen.tuxfamily.org/dox/TopicInsideEigenExample.html


#2

This is usually called NRVO (“Named Return Value Optimization”), because v in <Add for Array>::add could be using the return pointer.
I thought we had it, but I just looked at the relevant code and it’s not there.
There is this issue about it https://github.com/rust-lang/rfcs/issues/788 - and I vaguely recall explaining to the original poster how to implement simple NRVO in the compiler.
I’m not sure, there might be subtle issues around unwinding.

Half an hour later, I read the entire post again and I finally understood what the “move” in the title referred to.
However, I still don’t see what move operators would do to this example: the simplest possible move assignment in C++ that actually does something, is a Rust move.
The IR for the + operation, after cleaning it up a bit, looks like this:

  call void <Array as Add>::add([20 x float]* c, Array* arg, Array* arg3)

RVO is already in effect on the caller side, but the callee doesn’t make use of it for v, and LLVM doesn’t optimize away the copy from v to c after inlining, not even on the latest nightly with -C opt-level=3.

Minor stylistic nits: since the RHS is the same as the LHS, you can write impl Add for Array, there generally are spaces after : and ; (as in, let mut v: [f32; 20] = [0.0; 20];), the parens around self.0 and rhs.0 aren’t necessary, and the trailing return v; can be just v.


#3

Thank you for the reply! I was actually a little confused and didn’t realize that even in c++, move semantics are only useful in the case of dynamically allocated memory. There’s nothing to be done in this case because the array was allocated on the stack, so move cannot help anyone here.

But that last bit about the callee not utilizing RVO: is this low hanging fruit yet to be claimed in the compiler?


#4

A quick check with opt shows that this specific setup probably shows a problem that could be solved with a better pass setup for LLVM. Passing our unoptimized IR through opt -O2 twice gives code that only uses a single array that holds the computed sum.

That said, applying (N)RVO ourselves without having to rely on LLVM would be nice, but I didn’t really think about how to do that properly. Should be easier once we have a MIR.

Another thing that I noticed was that the optimized code we currently get not only has to copy v to c but also copies b to rhs (it could forward the memset for a though).

Just eliminating those copies by passing a and b directly into the add function (because they’re read-only making a copy is unnecessary) makes things worse though. For whatever reason…


#5

Oh, one thing, you can get better results here by avoiding the 0 initialization of v, like this:

impl Add<Array> for Array {
    type Output = [f32;20];
    fn add(self, rhs: Array) -> Self::Output {
        let Array(mut v) = self;
        for i in 0..20 {
            v[i] += (rhs.0)[i];
        }
        return v;
    }
}

Using that definition of add() lets your example compile down to a array of 1.0f32 values and the code to print it.


#6

In my tests it seems Rustc doesn’t yet perform RVO, it’s a sufficiently important optimization.


#7

The distinction between NRVO and RVO is significant: the “N” stands for “named”, i.e. let x; x = f(); x, as opposed to simply f() (which does have RVO in Rust, at the ABI level).

You can follow https://github.com/rust-lang/rust/issues/32966 to get updates on our NRVO progress.