Behind the scenes, how does Rust move structs?

mczarnek · August 31, 2023, 4:35pm

Imagine you have this example:

struct LargeStruct {
    data: [i32; 10000],  // A large array to simulate a "large" struct
}

// A function that takes ownership of a LargeStruct
fn consume_large_struct(s: LargeStruct) {
    // Ownership of the struct is transferred here.
    let sum: i32 = s.data.iter().sum();
    println!("The sum of all elements in the data array is: {}", sum);
}

fn main() {
    // Initialize a LargeStruct instance on the stack
    let my_large_struct = LargeStruct { data: [0; 10000] };

    // Pass the LargeStruct into a function, transferring its ownership
    consume_large_struct(my_large_struct);

    // The next line would result in a compile-time error, as my_large_struct has been moved
    // println!("Data: {:?}", my_large_struct.data);
}

My understanding is that Rust will create a copy of my_large_struct at the time ownership is transferred to the consume_large_struct function. Can someone verify this? Tried using Rust playground to compile to assembly but not understanding the output.

I always thought Rust used references behind the scenes but then was working through the logic of moving a struct's ownership and realizing that I believe an extra pointer or runtime check or something would be needed in this scenario instead of directly referencing the correct place on the stack which hurts performance.. the whole point of ownership.

I've heard the way to handle this is to Box it and pass in a link. However, in that scenario I don't understand the purpose of invalidating the previous struct if technically the two are copies of each other. I'm guessing it's for consistency and because single ownership works well for values stored on the heap?

Thanks!

kpreid · August 31, 2023, 5:16pm

My understanding is that Rust will create a copy of my_large_struct at the time ownership is transferred to the consume_large_struct function.

Abstractly — in the language semantics — the data is copied. Concretely, the optimizer may eliminate the copy because there is nothing in the program that actually demands it.

Tried using Rust playground to compile to assembly but not understanding the output.

My favorite assembly comprehension trick is to ignore everything but the call instructions and what's near them. In the (non-optimized, "debug"!) output for this program we see

playground::main:
	movq	%rsp, %r11
	subq	$77824, %r11

.LBB15_1:
	subq	$4096, %rsp
	movq	$0, (%rsp)
	cmpq	%r11, %rsp
	jne	.LBB15_1
	subq	$2184, %rsp
	leaq	40008(%rsp), %rdi
	xorl	%esi, %esi
	movl	$40000, %edx
	callq	memset@PLT
	leaq	8(%rsp), %rdi
	leaq	40008(%rsp), %rsi
	movl	$40000, %edx
	callq	memcpy@PLT
	leaq	8(%rsp), %rdi
	callq	playground::consume_large_struct
	addq	$80008, %rsp
	retq

which contains calls to memset (for zero initialization), memcpy (for copying to a new stack location — this is the main thing the optimizer would almost always eliminate), and consume_large_struct.

It also helps to know that lea ("load effective address") instructions are "given a designation of some place in memory, store its actual address in a register" and so, for example, leaq 8(%rsp), %rdi means "add 8 to the current value of the stack pointer and store that in rdi. (I didn't learn that by extensive study; I just did a web search for the opcode.) This is how consume_large_struct is given the address of the large argument it should read. We can see that there are two offsets appearing in the code, 8 and 40008, which will likely be the stack-relative addresses of the two copies of the data.

As a general rule (for any machine code program, not just one compiled from Rust), small values are passed in registers, that being the most efficient possible option, and large ones are passed as pointers. The exact rules depend on the choice of calling convention on the platform/architecture.

So, just because something is “by value” in the Rust language semantics doesn't mean there won't be indirection involved in the actual execution of the code.

The invalidation is indeed useless in this example. The point of it is that there are other types where it is useful behavior — usually to prevent a use-after-free of a type containing a pointer, but more generally for anything where it stops being valid after some point (e.g. an file handle that is closed, or a transaction that is committed). The move semantics ensure that the code implementing that type doesn't have to worry about the handle being closed, or whatever, more than exactly once, as long as it has ownership.

For types where this is not useful, such as most “plain old data” that doesn't contain any heap allocations, you implement Copy to opt out of the invalidation.

mczarnek · August 31, 2023, 6:01pm

Good answer!

Note that when I compile in release mode, that memcpy does indeed go away.

quinedot · August 31, 2023, 6:11pm

Taking a step back: There's no guarantee that the compiler will optimize out what are moves/memcpys by Rust's semantics, but in many cases it will.

In general this isn't something I think about until it becomes a problem, e.g. benchmarking has indicated a ton of copying things around or a type is obviously humongous but needs to be passed around a lot. For example, I generally don't let it sway my API choices such that I change something best served by taking by value into something that takes by &mut.

Instead, if and when it becomes a problem, you can Box up large types (or the fields of large types) to decrease their on-stack size.

Vorpal · August 31, 2023, 8:08pm

I'm going to strongly disagree on this one. I believe it one of the largest issues in Rust today, that there may be hidden copies.

When working in embedded (my hobby) Box is not a thing (unless you are on a high end target that supports alloc and you are willing to use allocations).
When doing hard real-time (my day job) unexpected copies and allocations are obviously unacceptable also.

My take is that you should always design as if the compiler did not eliminate any copies, and be plesently surprised when it actually does.

I believe it is also best if any library code follows these principles too, in order to make the code maximally useful for all downstream uses.

H2CO3 · August 31, 2023, 9:24pm

Sorry but that's completely unfeasible. You can't just expect everyone (including library authors) to completely ignore 50 years of advancements in compiler optimizations. If we had to, we would still be writing assembly and bit-packing every last integer by hand.

Embedded is pretty much a special case.

That's not exactly a high bar. Even AVRs (which can have as little as 2kB of RAM!) support dynamic allocation.

khimru · August 31, 2023, 9:46pm

Allocations may be a problem, copies are much less of a problem. You seem to forget that on modern CPUs time needed to access one, single, byte from RAM is equivalent to time needed to copy couple of kilobytes around!

That one extreme POV. Another, less extreme POV is to just accept the fact that Rust generates many times more copes than C does — and yet in real world programs the ability to save one “cold” reference (where CPU would have to travel to actual RAM) per few kilobytes copies is not that hard to achieve.

Simply because less pointer chasing and thus fewer potential cache misses are natural for Rust's program design.

user16251 · September 1, 2023, 12:49am

I think there are a few other problems:

If you want to construct a huge object that doesn't fit on the stack, you can't yet use the Box MaybeUninit methods stably.
Even if those methods were stable there are some things you can't do with them, such as construct a future that's only defined by async fn.
Relying on compiler optimizations doesn't work in debug mode.

tbfleming · September 1, 2023, 1:03am

Hard real-time systems don't like timing surprises. They're often created using C compilers which act like glorified assemblers instead of modern optimizing compilers to help repeatability of timing measurements. I wouldn't want to force general-purpose compilers and libraries to be real-time friendly.

ZiCog · September 1, 2023, 7:16am

True that.

Also true. Although there are many other languages that are and have been used. The most predictable systems I ever worked on were written in LUCOL, they were flight control systems. The LUCOL compiler would report exactly how much time your module will take to run, or how long a complete program composed of many modules would take to run on the target. That way you could be assured your code would not exceed its time quotas. C cannot do that. I have never seen any other language that could.

One often hears C described that way. I'm not is sure. For sure C is a high level language, what with it's support of structured programming, it's data structures, it's portability etc. Whilst not as sophisticated as say Rust it ends up being optimised by the same backend code generators, GCC or LLVM. It is subject to many of the same uncertainties. C compilers may or may not inline code, may or may not vectorise loops, may or may not perform multiplications and divisions with other operations, etc, etc. At the end of the line LLVM does not know if it is generating code from Rust or C sources. GCC and Clang certainly count as "modern optimising compilers" for C as much as Rust. Finally, C programmers are often surprised that modern compilers don't treat C as a glorified assembler and the code they get generated is not what they expect from their assumption that it it is. ("Damn optimiser broke my code" they say).

Rust can certainly be used for timing critical work. For example Cliff Biffle generates VGA graphics from a micro-controller using Rust: Rewriting m4vgalib in Rust - Cliffle

The thing is being able to stay away from things that result in unpredictable timing, like memory allocation/free, garbage collection, system calls etc. In C that means staying away from malloc and such. In Rust that means staying away from most of std, which leaves you in a C like world.

Quite so. If you mean compilers for languages that depend on memory allocation and/or garbage collection and such. Rust is fine though, as much as C is anyway.

khimru · September 1, 2023, 8:42am

How can it do that? Consider one, single, operations: counter that increases. Just a simple:

    inc WORD PTR[CounterName]

This operation may take 0.2ns. Or it may take 100ns. You couldn't predict that. That's how Itanic died: it was designed around compiler-constructed schduling of instructions and was counting on predictable instructions performance and that's just not something we may have today.

What does LUCOL do about that? Assumes that every operation that touches memory needs 100ns and wastes 99.8% of CPU power?

If you don't have hardware which can give you some guarantees then you can not create such a language. And we don't have such hardware. Not for last 30 years, at least.

C have started that way. But then C committee tried to turn it into real programming language. It failed, of course, that's not really possible to do. But it come close enough to success that now we have people who believe C is a real programming language and people who believe it's a portable assembler and since we couldn't separate chaff from the wheat it's better to treat C like PL/I: dead language which, nonetheless, is used by some developers because they don't know anything else.

tbfleming · September 1, 2023, 9:34am

Yes. Real-time devs used to avoid these.

Arm M core processors don't have this problem. Static ram, no cache, no branch prediction, shallow pipeline, etc. Very widely used.

Hiring people with embedded experience is a mixed blessing. I have to keep an eye on their work for a time and demonstrate to them what real compilers do. The ones that can make the adjustment tend to come out great. The ones that don't...

ZiCog · September 1, 2023, 10:20am

It was a different world. The world of processors like Motorola 68000, Intel 8086, and other embedded systems processors including custom, in house design, processors. Caches did not exist, there were no branch predictors and so on. Processors ran in lock step with their memory. Vendors provided data books that specified exactly how many clock cycles every operation took. LUCOL only needed to generate code, find the longest path through a module and add up the time of each operation on it. And likewise for compositions of modules.

If missing timing deadlines is of critical importance then of course one cannot use hardware that does not provide such timing guarantees. Or yes, one does indeed have to assume the worst case and sacrifice the potential performance. No way out of that.

Sure we do. Again if timing deadlines are of critical importance then you use one of the many embedded systems controllers that provide such predictability. Or you crate your own, A RISC V design implemented in an FPGA say, Or heck, move timing critical functionality into hardware altogether.

C most certainly was a high level programming language. As much as ALGOL, FORTAN, Pascal, PL/M, Coral and others. The definition of C's syntax and semantics has changed very little in many decades. I have not seen the standard committee trying to turn it into anything very different. Wisely they do not.

You seem to be ignoring historical context here. C has hardly changed so it must still be a "high level" programming language. More likely the definition of "high level" has changed.

I'm not about to write C off as dead language while our entire industry is based on it. From operating systems built in C (BSD, Linux, likely much of Windows still, countless embedded operating systems...) to a myriad of libraries and tools used everywhere. If you want to program embedded systems C is one of very few choices.

I'm very curious to know what your pick of "real" languages is. What do you even mean by "real"?

khimru · September 1, 2023, 11:20am

As was already told:

Rust wasn't designed for embedded. It found use there, surprisingly enough, and Rust developers try to help as much as they can… but they couldn't just take the language designed for “big” systems and turn it on it's ear with “our embedded guys are suffering” scream. The majority of developers would scream.

Ideally we would want entirely different language for embedded, but since today's embedded CPU are close enough in power to mini-computers of the past (that CPU that drivea your charger is more powerful than Apollo guidance computer, apparently) Rust work for embedded adequately enough.

No, we don't. Worse: such hardware would never return. It's just physically impossible. This:

Is just physically not possible to achieve if your CPU have working frequency measured in gigahertz.

More practical way it to split your system in two: one which runs CPU which is slow yet predictable and whose job is to prevent explosion of your rocket engine. By hard-shutting it down if there are no other choice.

And moving 99% of logic on modern CPU which is 1000 times faster but have unpredictable timings.

C was never a high-level language in the first place. Initially it was what these days is called bytecode. BCPL, then B, then C. It's specification was in terms on registers and memory, not in terms of abstract machine.

Then, later, C comittee tried, and tried very hard to turn C into high-level language. It failed, ultimately, since there are still, to this very day, many unanswered questions about how certain construction should work, but it succeeded just enough for people to believe that C is a high-level language.

FORTRAN is almost in the same boat as C, but even it's very first somewhat portable version, FORTRAN 66 is not defined in terms of memory and registers.

Most other languages were influenced by ALGOL from the beginning and thus were built around the idea that your target is abstract machine in the language specification, not the real hardware.

Sure, actual implementation often offered proprietary, non-portable, extensions but these were, by their very nature, addons, not the language core.

And you seem to do the same.

Nope. The first “true” high-level language, ALGOL: ALGOL was used mostly by research computer scientists in the United States and in Europe. Its use in commercial applications was hindered by the absence of standard input/output facilities in its description and the lack of interest in the language by large computer vendors other than Burroughs Corporation. ALGOL 60 did however become the standard for the publication of algorithms and had a profound effect on future language development.

IOW: originally high-level languages were't even used to write programs which may run on actual hardware! They were designed to separate algorithms from hardware, to make it possible to write programs without thinking about hardware!

Since it's very inception. Only that, of course, caused a conflict: people who were coding in low-level languages first tried to portray that high-level languages as “unfit for writing actual program” and then, later, they embraced C because it was not-quite-high-level-language and was defined in terms of registers and memory cells.

It even worked, for some time, but at some point it become impossible to keep that illusion that C is just portable assembler. But that's not what killed it. What killed it is adamant refusal of large part of the community to accept the fact that high-level languages are fundamentally different from low-level languages.

That's not the first language which was used as “basis for the entire industry”. And not the first to be retired. It's wouldn't completely disappear. z/OS is written in PL/X and it's still widely used.

But the sooner we would stop using C and C++ the better.

I don't know what do you mean. I was talking about “real high-level” laguages. And by that I meat the original definition: a programming language with strong abstraction from the details of the computer.

It should be possible to understood it without talking about memory, registers, and other such things.

“Normal”, safe, Rust is definitely such language. unsafe Rust… it's somewhat closer to C. People are trying to turn it into full-blown high-level language, but that's hard.

Still going from C/C++ to Rust would be step in the right direction since in most programs use of unsafe Rust is limited.

H2CO3 · September 1, 2023, 11:35am

I don't get why that's a surprise. Rust's low-level capabilities are pretty much exactly the same as those of C. You seem to be fixated on moves, but nothing prevents you from doing the same thing in C and observe that it also does or does not emit copies based on optimization settings.

C's semantics is also defined in terms of the C abstract machine, not in terms of any particular piece of hardware.

ZiCog · September 1, 2023, 12:03pm

Perhaps not tiny embedded systems as such. But Rust has always been advertised as a "systems programming language" which puts in in the same group as C, Pascal, Ada, Coral and others in my mind. Also not that the Rust devs pulled out an implementation of green threads (or whatever they were called) as that did not fit the "systems programming" target (Replacing it with async support) So, in a way, Rust has ensured its usefulness in embedded systems for a long time now. There is also a group working on "parity with C" so as to ensure Rust is usable in such systems.

No what? I have some on my desk as we speak.

That is a valid approach that many are employing.

No. I don't know so much about BCPL or B, although I understand they provided inspiration for some C features. C was specifically designed and built to enable rewriting Unix in a portable, high level language so as to promote the adoption of Unix. It always compiled to real machine code.

I really cannot see this. In what way did they try? What new feature(s) in C are you referring to?

Hmm.. So Tony Hoare didn't write an Algol compiler for his employers machines. Did I imagine using ALGOL on ICL 2960's back in 1976? Sure it was designed to be used. Only problem was (Apart from the unstandardised IO) that performance suffered as ALGOL tried to be a safe language. Computers were slow, people continued to use assembler or the unsafe FORTRAN.

Nothing has killed C yet. It is still massively used. Often nothing else, or nothing much better, can do what C does. Ada was a contender. Rust is an even better contender today.

Exactly that? Which "real high-level languages" do you mean? A few examples perhaps...

I can only agree with that.

khimru · September 1, 2023, 12:05pm

Yes, some people try to pretend that it's true. But that's not how K&R was presented or developed (early compilers would refuse to create more than three register variables in on function, e.g.)

C committee did super job with an attempt to turn C into high level-language, but it, ultimately, failed: when all major compiler miscompile certain programs which are 100%-compliant with specification and, after 20 years of discussions we still have no idea whether it's bug in specification or in compilers… I don't think I can buy argument “C's semantics is also defined in terms of the C abstract machine”.

And when one of the authors of the language quite explicitly writes the committee has created an unreal language that no one can or will actually use you know for sure that that language Dennis Ritchie created and committee C are different languages.

Similarly to how Rust that Graydon wanted is not the same as Rust we have got.

Please read the actual RFC please. Green threads were removed without any plans to create an async support. The reasons Rust removed green threads are more-or-less the same reasons that prompted Java to do the same. And I don't know anyone who tries to push the story that Java did that in attempt to support embedded development.

Yes, but there was no plan to support embedded development ever. It just sorta happened: people tried to do that, then they asked if certain things may be changed and things snowballed from there.

Not possible to create something with predictable timings and core frequency measured in gigahertz.

Would be really interesting how that device pulls that stunt. I guess you may go Cell route and make global RAM accessible only via DMA but then you are just pushing problem on developer.

It was developed to port Unix Version 3 (back then written in assembler) from PDP/11 to some other system. Nothing more, nothing less.

No. The important part was that path from CPL to BCPL to B.

CPL was classic high-level language (first published in 1963, first working compiler in 1970, heh).

Then BCPL went one stop toward non-language: it removed many CPL facilities but had a virtual machine (years before Java or p-Code, heh) which made it, technically, “a high-level language” (one may tell what program is doing by looking on the bytecode, which is independent from hardware).

Then, finally, B have become “a language without specification”: it removed the virtual machine and thus it's behavior stopped being predictable.

After that C lost any chance to become a language (high-level or otherwise): it no longer had a virtual machine which may isolate it from hardware and it had no definition in terms of hardware, too! The best name for it is autocode, but it's rarely used today.

But as long as one rigorously tested code on different systems it was possible to write code which was “practically portable” and fast.

But C was never a language, let alone high-level language till C committee attempt to, somehow, create a single specification for all these dozens of different and incompatible autocodes.

Undefined behavior. Because C had no specs and no way to actually describe what programs written in this “kinda sorta maybe let's pretend it's a language” autocode they couldn't actually define what C programs are doing.

And their way out was undefined behavior: refusal to define how certain “bad” C programs work.

Other high-level languages do that, too (e.g. Pascal doesn't define what would happen to a program which tries to access pointer after call to Dispose), but none of them have, literally, hundreds of these.

But C committee had no choice: C non-language was already there, programs written in C non-language were already there, they couldn't write sane specification which would declare the majority of these programs incorrect! Thus they declared them “conditionally correct” and called it a day.

No, but I think you imagine that 1976 is, somehow, less that 1968 or, maybe, even 1968.

He did. After language was designed, published, discussed.

Sure. And he even added billion dollar mistake to it. One simplification as concession to what real computer can do. And he still apologizes about it to this very day.

Compare to C which included dozen such mistakes and whose developers never apologized for them.

nerditation · September 1, 2023, 1:03pm

guys let's discuss move struct please.

@mczarnek it's semantically correct, but the implementation doesn't require copy the data all the time. in short, for large struct, both return value and arguments are allocated on the caller stack frame, and the pointer is passed to the callee. (the optimizer may eliminate most of unnecessary data copies, especially when inline is possible).

if the struct doesn't implement Drop, you won't see any difference between moved arguments and reference arguments in the generated assembly. you can check this playground link:

in release build, it's just a single call to memset or memcpy. even in debug mode, it's not too complicated at all.

when the type does implement Drop, you'll see the difference between pass by move and pass by reference, but the difference is small. they are still allocated on the caller's stack, it's just when the value is passed by reference, the caller will call drop, but when the value is passed by move, it's the callee who calls drop.

note drop-ing a value and deallocating the memory are related but separate operations.

check this playground link:

user16251 · September 1, 2023, 7:57pm

Is this true for Box::new if the function returning a large type can panic? That is, is the optimizer allowed to elide the memcpy from the stack to the heap in Box::new(f()) if f can panic? What about Box::try_new?

nerditation · September 2, 2023, 3:22am

no. well, not always. what really makes a difference is if f is inline-able. if the optimizer cannot inline f, it can make no more effort to do the analysis. check this playground: even in such extreme simple cases, the stack allocation and memcpy is not eliminated, if inline is disabled.

Box::try_new() is the same as Box::new(), except it will report allocation failures instead of just panicking like Box::new(). other than that, it should behave exactly the same as Box::new().

some more examples

as for the consideration of panic, although it's not as important as inline, it does play a role to the optimizer, but not in the way you might expected. in this playground, although both constructor functions are marked #[inline(never)], one of them panics unconditionally, the other may or may not panic. (I also put an ffi case just for comparison)

although inlining is disabled, the optimizer can eliminate the stack allocation in the "always panics" case. my guess is, although the function return type says HugeNoDrop, the type checker actually marks it to have the never type ! so the optimizer can utilize this type information.

to my knowledge, C++ 's no_return attributes plays a similar role to the never type or diverging type, however, rust's type system doesn't have an equivalence to C++'s noexcept. I guess that's why non-inline-able function would stop the optimizer to make the assumption.

that's just my guess, but I'm not really sure it's true. because the nightly compiler may have some information about the "noexcept"-like information. in this playground, both constructors are marked #[inline(never)] but one of them may panic, and indeed even with the unstable #[feature(new_uninit)] enabled, the stack allocation and memcpy is not eliminated in the "may-panic" case.

Topic		Replies	Views
Why isn't Rust even faster than C++? Comparing heap vs stack allocation	26	6396	August 11, 2020
Using a move semantic - guaranteed penalty? help	18	1704	January 12, 2023
Does Rust have a way similar to prvalue in C++? help	28	1808	September 11, 2023
Part 3: The alloc module API	36	2094	April 23, 2023
Did Rust make the right choice about error handling?	73	5621	August 16, 2020

Behind the scenes, how does Rust move structs?

some more examples

Related topics