Needless memory juggling added after inline assembly

Practicing excessive optimisation, I wrote a function that unites two u16 numbers into u32, with first argument filling the higher word in asm! (source at the end).
Knowing that asm! can unwrap in ram-indifferent bytecode, I was surprised when I saw

mov dword ptr [rsp - 4], eax
mov eax, dword ptr [rsp - 4]

at the end of compiled function, with eax carrying the result. I don't get where does rsp - 4 pointer lead, so the hypothesis was that compiler plans to use the result from memory, which I disproved with a simple test function adding the result to itself.

So my questions are: why does it write the result to the memory, and if that's justified then why it has to read it back the next instruction?

Here on Godbolt are the unoptimised (if you are interested in how bad it gets without using assembly), optimised and test functions.

Because [rsp - 4] is where r variable lives. And you asm code writes into it. Then it's returned from the function.

Not sure what surprises you. It's just simple literal conversion from your source to assembler.

Note that your function includes lots of code to handle overflow that may never happen (but compiler doesn't remove that code because you asked not to perform any optimizations).

1 Like

Now I see, for some reason I thought rustc has optimisation enabled by default, and sought for reasons why would it refuse to optimise that stuff out. Shame on me.
Another purely educational question then: if the optimisation was off all the time, why doesn't let a = asmun(1,1) allocate memory, keeping value of a in EAX until later use?

Do you mean heap allocation? That's always explicit in Rust, though the optimizer sometimes removes heap allocation that you requested.

No, I'm about stack allocation. r in asmun gets explicitly allocated on stack by compiler, and so does b in main. But why does a never get placed in stack, if the optimization level is zero?

This is a side effect of using llvm. llvm needs SSA form, but that's a disasterous nightmare for compiler front ends to produce. Instead they generate IR which intentionally overuses the stack, including redundant loads and stores. llvm includes a pass regardless of optimization level which converts that to SSA. Even explicit stack use often gets eliminated by that pass

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.