Behind the scenes, how does Rust move structs?

kpreid · August 31, 2023, 5:16pm

My understanding is that Rust will create a copy of my_large_struct at the time ownership is transferred to the consume_large_struct function.

Abstractly — in the language semantics — the data is copied. Concretely, the optimizer may eliminate the copy because there is nothing in the program that actually demands it.

Tried using Rust playground to compile to assembly but not understanding the output.

My favorite assembly comprehension trick is to ignore everything but the call instructions and what's near them. In the (non-optimized, "debug"!) output for this program we see

playground::main:
	movq	%rsp, %r11
	subq	$77824, %r11

.LBB15_1:
	subq	$4096, %rsp
	movq	$0, (%rsp)
	cmpq	%r11, %rsp
	jne	.LBB15_1
	subq	$2184, %rsp
	leaq	40008(%rsp), %rdi
	xorl	%esi, %esi
	movl	$40000, %edx
	callq	memset@PLT
	leaq	8(%rsp), %rdi
	leaq	40008(%rsp), %rsi
	movl	$40000, %edx
	callq	memcpy@PLT
	leaq	8(%rsp), %rdi
	callq	playground::consume_large_struct
	addq	$80008, %rsp
	retq

which contains calls to memset (for zero initialization), memcpy (for copying to a new stack location — this is the main thing the optimizer would almost always eliminate), and consume_large_struct.

It also helps to know that lea ("load effective address") instructions are "given a designation of some place in memory, store its actual address in a register" and so, for example, leaq 8(%rsp), %rdi means "add 8 to the current value of the stack pointer and store that in rdi. (I didn't learn that by extensive study; I just did a web search for the opcode.) This is how consume_large_struct is given the address of the large argument it should read. We can see that there are two offsets appearing in the code, 8 and 40008, which will likely be the stack-relative addresses of the two copies of the data.

As a general rule (for any machine code program, not just one compiled from Rust), small values are passed in registers, that being the most efficient possible option, and large ones are passed as pointers. The exact rules depend on the choice of calling convention on the platform/architecture.

So, just because something is “by value” in the Rust language semantics doesn't mean there won't be indirection involved in the actual execution of the code.

The invalidation is indeed useless in this example. The point of it is that there are other types where it is useful behavior — usually to prevent a use-after-free of a type containing a pointer, but more generally for anything where it stops being valid after some point (e.g. an file handle that is closed, or a transaction that is committed). The move semantics ensure that the code implementing that type doesn't have to worry about the handle being closed, or whatever, more than exactly once, as long as it has ownership.

For types where this is not useful, such as most “plain old data” that doesn't contain any heap allocations, you implement Copy to opt out of the invalidation.

Topic		Replies	Views
How to pass the ownership of a large size array? community	3	298	August 26, 2024
When I pass the ownership of a large noncopy struct what exactly happens?	3	692	April 5, 2020
Simple question to official Rust book help	3	277	December 29, 2023
Newby question: Does rust's lifetime rules cause more memory copying in practice? help	34	1196	November 21, 2024
Ownership structures help	6	1031	January 12, 2023

Behind the scenes, how does Rust move structs?

Related topics