Can I trust Rust to optimize move semantics?


#1

I haven’t found any clear documentation on this subject, so I’d like to understand more on this topic. There are many places in a common Rust code where structs are “moved”, e.g. when you return a struct from the “constructor” (e.g. static “new” function) or when you push a struct to a vector. However I don’t quite understand what moving means in terms of under the hood implementation and performance. In some posts I read that moving would always mean a “shallow” memory copy, i.e. it’s only a semantic move. But that is not very great when you work with big structs, because such copy can be quite expensive. In C++ when I see a function that accepts a reference to a class, I can be sure that no unnecessary copying will happen, but I don’t have such confidence when looking at similar Rust function accepting a struct by value. I’m sure Rust can be smart enough and optimize moving, but just wondering if there is any guarantee of doing so?


#2

Rust isn’t defined to do anything in particular but it features the usual gamut of optimizations you would expect in e.g. C++ thanks to the LLVM backend. If you pass a large struct (bigger than a ptr or two?) to a function it will probably be passed by reference anyway. Functions may be inlined to the point that no function call (and therefore no move) may occur at all. Or possibly your big struct might not be constructed at all due to constant propagation and folding. Similarly returning something by-value might be converted into something more efficient automatically (such as LLVM passing in an out-pointer to the returning function to construct the value in the caller’s frame).

So basically you can trust Rust in the same ballpark as Clang (same backend). Probably a bit less since our compiler is a lot more immature and we haven’t spent a lot of resources on optimization and don’t have great perf regression tooling.


#3

Some optimizations you’ve mentioned belong to frontend (rustc), not LLVM.

If you pass a large struct (bigger than a ptr or two?) to a function it will probably be passed by reference anyway.

Bigger than one pointer (with some nuances), it is done in frontend and it is done suboptimally compared to clang (see issue #22891 for example).

Functions may be inlined to the point that no function call (and therefore no move) may occur at all.

Inlining is done in LLVM.

Or possibly your big struct might not be constructed at all due to constant propagation and folding.

AFAIK, it’s done in LLVM if the value isn’t needed at compile time.

Similarly returning something by-value might be converted into something more efficient automatically (such as LLVM passing in an out-pointer to the returning function to construct the value in the caller’s frame).

It’s done in frontend and, again, differently than in clang.

One more thing to note is that rustc unlike clang doesn’t currently have NRVO (named return value optimization, it belongs to frontend too), so if you return something big and named it will be copied.

Edit: Even if frontend does something suboptimal, LLVM still can try to optimize it and succeed.


#4

No. ;p In general - lack of NRVO and related optimizations is probably a big part of this - compared to C code, Rust generated machine code tends to contain a lot of movs to move structure contents around. I just opened librustc_resolve.dylib in IDA and literally clicked at a random location in the address space visualization and found this:

__text:000000000002FC85        mov     rdx, [rbp-1A0h]
__text:000000000002FC8C        mov     [rbp-70h], rdx
__text:000000000002FC90        mov     rdx, [rbp-1A8h]
__text:000000000002FC97        mov     [rbp-78h], rdx
__text:000000000002FC9B        mov     rdx, [rbp-1B8h]
__text:000000000002FCA2        mov     rsi, [rbp-1B0h]
__text:000000000002FCA9        mov     [rbp-80h], rsi
__text:000000000002FCAD        mov     [rbp-88h], rdx
__text:000000000002FCB4        lea     rdi, [rbp-78h]
__text:000000000002FCB8        mov     [rbp-190h], r12
__text:000000000002FCBF        mov     [rbp-198h], r12
__text:000000000002FCC6        mov     [rbp-1A0h], r12
__text:000000000002FCCD        mov     [rbp-1A8h], r12
__text:000000000002FCD4        mov     [rbp-1B0h], r12
__text:000000000002FCDB        mov     [rbp-1B8h], r12
__text:000000000002FCE2        mov     rdx, [rbp-60h]
__text:000000000002FCE6        mov     [rbp-30h], rdx
__text:000000000002FCEA        mov     rdx, [rbp-68h]
__text:000000000002FCEE        mov     [rbp-38h], rdx
__text:000000000002FCF2        mov     rdx, [rbp-70h]
__text:000000000002FCF6        mov     [rbp-40h], rdx
__text:000000000002FCFA        mov     rdx, [rbp-78h]
__text:000000000002FCFE        mov     [rbp-48h], rdx
__text:000000000002FD02        mov     rdx, [rbp-88h]
__text:000000000002FD09        mov     rsi, [rbp-80h]
__text:000000000002FD0D        mov     [rbp-50h], rsi
__text:000000000002FD11        mov     [rbp-58h], rdx
__text:000000000002FD15        mov     [rbp-60h], r12
__text:000000000002FD19        mov     [rbp-68h], r12
__text:000000000002FD1D        mov     [rbp-70h], r12
__text:000000000002FD21        mov     [rbp-78h], r12
__text:000000000002FD25        mov     [rbp-80h], r12
__text:000000000002FD29        mov     [rbp-88h], r12

Seems to be a combination of moving a structure from one stack location to another, and zeroing the original. This is an optimized build. My hitting upon this randomly was only part luck; the code is filled with this type of stuff. The planned drop improvements should alleviate the zeroing at least.

I have no idea whether such code significantly affects runtime performance or not, but it sure seems to affect code size, which is an important concern for me. And of course code bloat tends to decrease performance via cache unfriendliness.

Aside from drop and NRVO, I think it’s not really about Rust’s backend being worse than any C compiler’s; compared to Rust, C and C++ code is usually much less likely to pass large structures by value, so rustc needs to be better at dealing with them to generate equal quality code.