I haven’t found any clear documentation on this subject, so I’d like to understand more on this topic. There are many places in a common Rust code where structs are “moved”, e.g. when you return a struct from the “constructor” (e.g. static “new” function) or when you push a struct to a vector. However I don’t quite understand what moving means in terms of under the hood implementation and performance. In some posts I read that moving would always mean a “shallow” memory copy, i.e. it’s only a semantic move. But that is not very great when you work with big structs, because such copy can be quite expensive. In C++ when I see a function that accepts a reference to a class, I can be sure that no unnecessary copying will happen, but I don’t have such confidence when looking at similar Rust function accepting a struct by value. I’m sure Rust can be smart enough and optimize moving, but just wondering if there is any guarantee of doing so?
Rust isn’t defined to do anything in particular but it features the usual gamut of optimizations you would expect in e.g. C++ thanks to the LLVM backend. If you pass a large struct (bigger than a ptr or two?) to a function it will probably be passed by reference anyway. Functions may be inlined to the point that no function call (and therefore no move) may occur at all. Or possibly your big struct might not be constructed at all due to constant propagation and folding. Similarly returning something by-value might be converted into something more efficient automatically (such as LLVM passing in an out-pointer to the returning function to construct the value in the caller’s frame).
So basically you can trust Rust in the same ballpark as Clang (same backend). Probably a bit less since our compiler is a lot more immature and we haven’t spent a lot of resources on optimization and don’t have great perf regression tooling.
Some optimizations you’ve mentioned belong to frontend (rustc), not LLVM.
If you pass a large struct (bigger than a ptr or two?) to a function it will probably be passed by reference anyway.
Bigger than one pointer (with some nuances), it is done in frontend and it is done suboptimally compared to clang (see issue #22891 for example).
Functions may be inlined to the point that no function call (and therefore no move) may occur at all.
Inlining is done in LLVM.
Or possibly your big struct might not be constructed at all due to constant propagation and folding.
AFAIK, it’s done in LLVM if the value isn’t needed at compile time.
Similarly returning something by-value might be converted into something more efficient automatically (such as LLVM passing in an out-pointer to the returning function to construct the value in the caller’s frame).
It’s done in frontend and, again, differently than in clang.
One more thing to note is that rustc unlike clang doesn’t currently have NRVO (named return value optimization, it belongs to frontend too), so if you return something big and named it will be copied.
Edit: Even if frontend does something suboptimal, LLVM still can try to optimize it and succeed.
No. ;p In general - lack of NRVO and related optimizations is probably a big part of this - compared to C code, Rust generated machine code tends to contain a lot of movs to move structure contents around. I just opened
librustc_resolve.dylib in IDA and literally clicked at a random location in the address space visualization and found this:
__text:000000000002FC85 mov rdx, [rbp-1A0h] __text:000000000002FC8C mov [rbp-70h], rdx __text:000000000002FC90 mov rdx, [rbp-1A8h] __text:000000000002FC97 mov [rbp-78h], rdx __text:000000000002FC9B mov rdx, [rbp-1B8h] __text:000000000002FCA2 mov rsi, [rbp-1B0h] __text:000000000002FCA9 mov [rbp-80h], rsi __text:000000000002FCAD mov [rbp-88h], rdx __text:000000000002FCB4 lea rdi, [rbp-78h] __text:000000000002FCB8 mov [rbp-190h], r12 __text:000000000002FCBF mov [rbp-198h], r12 __text:000000000002FCC6 mov [rbp-1A0h], r12 __text:000000000002FCCD mov [rbp-1A8h], r12 __text:000000000002FCD4 mov [rbp-1B0h], r12 __text:000000000002FCDB mov [rbp-1B8h], r12 __text:000000000002FCE2 mov rdx, [rbp-60h] __text:000000000002FCE6 mov [rbp-30h], rdx __text:000000000002FCEA mov rdx, [rbp-68h] __text:000000000002FCEE mov [rbp-38h], rdx __text:000000000002FCF2 mov rdx, [rbp-70h] __text:000000000002FCF6 mov [rbp-40h], rdx __text:000000000002FCFA mov rdx, [rbp-78h] __text:000000000002FCFE mov [rbp-48h], rdx __text:000000000002FD02 mov rdx, [rbp-88h] __text:000000000002FD09 mov rsi, [rbp-80h] __text:000000000002FD0D mov [rbp-50h], rsi __text:000000000002FD11 mov [rbp-58h], rdx __text:000000000002FD15 mov [rbp-60h], r12 __text:000000000002FD19 mov [rbp-68h], r12 __text:000000000002FD1D mov [rbp-70h], r12 __text:000000000002FD21 mov [rbp-78h], r12 __text:000000000002FD25 mov [rbp-80h], r12 __text:000000000002FD29 mov [rbp-88h], r12
Seems to be a combination of moving a structure from one stack location to another, and zeroing the original. This is an optimized build. My hitting upon this randomly was only part luck; the code is filled with this type of stuff. The planned drop improvements should alleviate the zeroing at least.
I have no idea whether such code significantly affects runtime performance or not, but it sure seems to affect code size, which is an important concern for me. And of course code bloat tends to decrease performance via cache unfriendliness.
Aside from drop and NRVO, I think it’s not really about Rust’s backend being worse than any C compiler’s; compared to Rust, C and C++ code is usually much less likely to pass large structures by value, so rustc needs to be better at dealing with them to generate equal quality code.