My understanding is that Rust will create a copy of my_large_struct at the time ownership is transferred to the consume_large_struct function.
Abstractly — in the language semantics — the data is copied. Concretely, the optimizer may eliminate the copy because there is nothing in the program that actually demands it.
Tried using Rust playground to compile to assembly but not understanding the output.
My favorite assembly comprehension trick is to ignore everything but the call
instructions and what's near them. In the (non-optimized, "debug"!) output for this program we see
playground::main:
movq %rsp, %r11
subq $77824, %r11
.LBB15_1:
subq $4096, %rsp
movq $0, (%rsp)
cmpq %r11, %rsp
jne .LBB15_1
subq $2184, %rsp
leaq 40008(%rsp), %rdi
xorl %esi, %esi
movl $40000, %edx
callq memset@PLT
leaq 8(%rsp), %rdi
leaq 40008(%rsp), %rsi
movl $40000, %edx
callq memcpy@PLT
leaq 8(%rsp), %rdi
callq playground::consume_large_struct
addq $80008, %rsp
retq
which contains calls to memset
(for zero initialization), memcpy
(for copying to a new stack location — this is the main thing the optimizer would almost always eliminate), and consume_large_struct
.
It also helps to know that lea
("load effective address") instructions are "given a designation of some place in memory, store its actual address in a register" and so, for example, leaq 8(%rsp), %rdi
means "add 8 to the current value of the stack pointer and store that in rdi
. (I didn't learn that by extensive study; I just did a web search for the opcode.) This is how consume_large_struct
is given the address of the large argument it should read. We can see that there are two offsets appearing in the code, 8
and 40008
, which will likely be the stack-relative addresses of the two copies of the data.
As a general rule (for any machine code program, not just one compiled from Rust), small values are passed in registers, that being the most efficient possible option, and large ones are passed as pointers. The exact rules depend on the choice of calling convention on the platform/architecture.
So, just because something is “by value” in the Rust language semantics doesn't mean there won't be indirection involved in the actual execution of the code.
The invalidation is indeed useless in this example. The point of it is that there are other types where it is useful behavior — usually to prevent a use-after-free of a type containing a pointer, but more generally for anything where it stops being valid after some point (e.g. an file handle that is closed, or a transaction that is committed). The move semantics ensure that the code implementing that type doesn't have to worry about the handle being closed, or whatever, more than exactly once, as long as it has ownership.
For types where this is not useful, such as most “plain old data” that doesn't contain any heap allocations, you implement Copy
to opt out of the invalidation.