What prevents moving of variables from being optimized away?

In the following example (Rust Playground):

use std::time::SystemTime;

#[derive(Debug)]
struct NonCopyU64(u64);

fn main() {
    let i = NonCopyU64(SystemTime::now().duration_since(SystemTime::UNIX_EPOCH).unwrap().as_secs());

    // 0 move
    println!("{:?}", i);

    // 1 move
    // let j = i;
    // println!("{:?}", j);

    // 2 moves
    // let j = i;
    // let k = j;
    // println!("{:?}", k);
}

I thought let j = i; println!("{:?}", j); should be equivalent to println!("{:?}", i);, but the former generated more machine code (in release mode).

Then I guessed let j = i; let k = j; println!("{:?}", k); would generate even more code, but turned out it didn't, which left me wondering what makes the difference?

Side note: if i was originally a u64 instead of a NonCopyU64, two assignments would still be equivalent to one, but one would be different from none.

I think this relates to this issue: println!() prevents optimization by capturing pointers · Issue #50519 · rust-lang/rust · GitHub

2 Likes

For what I can tell, the version with the moving optimizes perhaps even better.

It’s “more” machine code because the change happens to introduce the need to use the (apparently callee-saved) register %rbx, which needs to be saved and restored at the end of main. But otherwise, the code on the right (which is the 1 move version, vs. the 0 moves version on the left) IMO does better in that it accomplishes to re-use the stack location 16(%rsp) for both (part of) the SystemTime value (as far as I can tell), and then later for the number to be printed.

This has the effect that the location 16(%rsp) only has to be calculated once (leaq 16(%rsp)) and that address is stored over the call to SystemTime::duration_since by making use of %rbx.

As far as moving the actual integer i itself goes, the two versions of the code do exactly the same amount. Receive the value from the Result on the stack on 24(%rsp) or 32(%rsp) (which was passed to SystemTime::duration_since as a return-address through %rdi); move that value into %rax (movq 32(%rsp), %rax or movq 40(%rsp), %rax, respectively, so apparently it’s at an 8-byte offset from the start of the whole Result<Duration, SystemTimeError>), then move that value back to the stack to where the argument to <playground::NonCopyU64 as core::fmt::Debug>::fmt is supposed to be.[1]


This is in line with what @Bruecki linked above, that sometimes moving the value out of its original location before it’s passed to println can be advantageous (as the original address does not need to be preserved). The way to achieve the move that’s mentioned in the issue discussion, by using a block expression, like println!("{:?}", {i}), where {i} will move the value of i into a temporary, results in the same assembly as the version with the let j = i step.

As far as I know: The crucial implementation-detail here by the way is that the formatting code used for println converts the values to-be-printed (which are accessed by-reference, so some &u64) into &dyn Display (or at least something comparable), so the pointer to the printed number is ultimately handed to some opaque function, which could inspect “where” the value lives and expect consistent results in case the value wasn’t ever moved.


  1. Of course one could question further optimization decisions by LLVM here. For example, could it somehow be possible to leave the integer in the same place inside of the Result<Duration, SystemTimeError>, where it originally came from? Also, could LLVM be smart enough to duplicate the leaq 16(%rsp) calculation in order to avoid the usage of %rbx – assuming of course that that’s a worthwhile tradeoff to begin with. ↩︎

2 Likes

So by writing an explicit move, I accidentally hinted the compiler to choose the optimized way, which would otherwise not be an available choice for the compiler to make on its own?

Thanks for the detailed explanation. I guess I should take some time to get more familiar with x86 assembly before inspecting more problems of the like.

Actually my question was originally "does Rust move semantics actually cause the compiled code to move values around the stack?"

From what we've seen, the answer would be: It depends, and in case it does move, moving might not have a negative impact on performance.

While I don't really have explanations why it made the exact difference it made above, in general, the absence of a move of some variable i, when you inspect its address (by taking a pointer and then inspecting the address), there are relevant effects in the behavior that can change if a move is introduced:

For example (just an example, I don't think this closely relates with whatever property was actually relevant for your code here): inspect the address of the same variable twice, then you can expect to get the same result. Whereas if you inspect &i, then move let j = 1 and inspect the target address of &j, the two are allowed to be different. (In this context “allowed to be different” means (a) there is a difference in the first place, so assembly can be different, and (b) merely allowing a difference should generally allow only move optimization.

The code in question here didn’t actually inspect pointer addresses, but it called dynamic function pointers, so the optimizer had no way to know across these function calls that pointer addresses weren’t inspected.


As an argument the other way:removal of moves is “only” an optimization, too, so additional moves can of course end up being overhead as well (though that overhead is rarely a concern for types as small as a single integer, and also should (hopefully) rarely be introducing any overhead for the very simple case of moves between two stack variables of the same function).

2 Likes

The thing that almost never happens but that the compiler has to worry about anyway is that it's possible that the Debug implementation for something might look at the exact pointer value you passed. For example, it's possible to make a Debug implementation that uses a static to do something different if the address of the passed-in integer is the same as the last one.

So unless it can prove that silliness like that isn't happening, it can't in general say "well those are the same so I'll just re-use the address". This is one reason that passing i32 is better than passing &i32, in general, since a parameter's address is clearly "meaningless" in a way that passing a reference isn't.

4 Likes

You mean whenever a variable got its reference passed to a fn which is not inlined, this single fact prevents the stack value from being reused when assigned to another variable? That hurts.