Asm! macro with FPU operation at x86


#1

I am trying to understand inline assembly feature in Rust and as an example I want to write a function that manipulates floating point at x86 FPU. Something simple as square-root function. The FPU instruction format is sqrtsd %xmmsR, %xmmsR. So I created a function:

#![feature(asm)]

fn sqrt_asm(i: f64) -> f64 {
  let ret: f64;
  unsafe { asm!("sqrtsd $0, $0"
              : "=r"(ret)
              : "0"(i)
              : ) }
  return ret;
}

fn main() {
  println!("{}", sqrt_asm(15.2));
}

I see that input/output double parameters are passed using xmms0, what is good. But it is not clear for me how to make asm accept variable and output to the same xmms register.

Does anyone have an experience with asm! and FPU instructions?


#2

After several blind tries I was able to make my function working:

fn sqrt_asm(i: f64) -> f64 {
  let ret: f64;
  unsafe { asm!("sqrtsd %xmm0, %xmm0"
              : "={xmm0}"(ret)
              : "{xmm0}"(i)
              : ) }
  return ret;
}

But I am not sure if it is the best way to implement this kind of functions. In particular

  • is it going to work if compiler decides to use register other than xmm0 to pass function arguments?
  • is there a way avoid hardcoding xmm0 register? I wish if compiler had a freedom to choose any available xmms register.

And here is the function asm code

0000000000006160 <_ZN4sqrt8sqrt_asm17h7dee47c5a631820dE>:
    6160:       f2 0f 11 44 24 e8       movsd  %xmm0,-0x18(%rsp)
    6166:       f2 0f 10 44 24 e8       movsd  -0x18(%rsp),%xmm0
    616c:       f2 0f 51 c0             sqrtsd %xmm0,%xmm0
    6170:       f2 0f 11 44 24 f0       movsd  %xmm0,-0x10(%rsp)
    6176:       f2 0f 10 44 24 f0       movsd  -0x10(%rsp),%xmm0
    617c:       c3                      retq   
    617d:       0f 1f 00                nopl   (%rax)

0000000000006180 <_ZN4sqrt4main17h1453d063b515cb9cE>:
    6180:       48 81 ec 88 00 00 00    sub    $0x88,%rsp
    6187:       f2 0f 10 05 b1 75 03    movsd  0x375b1(%rip),%xmm0        # 3d740 <_fini+0x24>
    618e:       00 
    618f:       48 8b 35 7a a1 24 00    mov    0x24a17a(%rip),%rsi        # 250310 <_ZN4sqrt4main15__STATIC_FMTSTR17h31b80c74e0c53704E>
    6196:       48 8b 15 7b a1 24 00    mov    0x24a17b(%rip),%rdx        # 250318 <_ZN4sqrt4main15__STATIC_FMTSTR17h31b80c74e0c53704E+0x8>
    619d:       48 89 54 24 18          mov    %rdx,0x18(%rsp)
    61a2:       48 89 74 24 10          mov    %rsi,0x10(%rsp)
    61a7:       e8 b4 ff ff ff          callq  6160 <_ZN4sqrt8sqrt_asm17h7dee47c5a631820dE>
    61ac:       f2 0f 11 44 24 70       movsd  %xmm0,0x70(%rsp)
    61b2:       48 8d 7c 24 78          lea    0x78(%rsp),%rdi

It is not clear for me what compiler is doing with moves at 6160, 6166, 6170, 6176. Is there a way to avoid it?
And ideally compiler should just inline this 1-operand function.


#3

According to the documentation (scroll down to x86) you can specify an XMM register using =x


#4

Are you compiling with cargo build --release? or rustc -O? If not, you’ll see a lot of stack setup like this.

I think x will give the SSE constraint you want, and you don’t even have to force it to be the same register.

fn sqrt_asm(i: f64) -> f64 {
    let ret: f64;
    unsafe {
        asm!("sqrtsd $1, $0"
              : "=x"(ret)
              : "x"(i)
              : )
    }
    return ret;
}

This still gives me just:

_ZN8rust_out8sqrt_asm17hf6e6ae11ad179d42E:
	.cfi_startproc
	#APP
	sqrtsd	%xmm0, %xmm0
	#NO_APP
	retq

You should also be able to write “xm” for the input, allowing either register or memory, but it seems LLVM always chooses memory in this case. :frowning:


#5

Thank you for your help folks.

According to the documentation (scroll down to x86) you can specify an XMM register using =x

Thank you. I did not realize that rust/llvm assembler format is compatible with GCC one. GCC seems has a nice documentation.

Are you compiling with cargo build --release? or rustc -O?

D’oh, I use rustc and adding -O turns on a lot of optimizations. Now sqrtsd is inlined exactly as I expect.

it seems LLVM always chooses memory in this case

In this case LLVM avoids loading input argument from memory into xmm0 and eliminated one operation.

Ok, here is the final example that works exactly as I expect:

#![feature(asm)]

#[inline]
fn sqrt_asm(i: f64) -> f64 {
  let ret: f64;
  unsafe { asm!("sqrtsd $1, $0"
              : "=x"(ret)
              : "xm"(i)
              : ) }
  return ret;
}

fn main() {
  println!("{}", sqrt_asm(15.2));
}

and then rustc -O sqrt.rs


#6

Using memory can be an advantage in some cases, but not always. When I tested this, I was actually forcing it not to inline, so I could see the function by itself.

#[inline(never)]
fn sqrt_asm(i: f64) -> f64 {
    let ret: f64;
    unsafe {
        asm!("sqrtsd $1, $0"
              : "=x"(ret)
              : "xm"(i)
              : )
    }
    return ret;
}
_ZN8rust_out8sqrt_asm17hf6e6ae11ad179d42E:
	.cfi_startproc
	movsd	%xmm0, -8(%rsp)
	#APP
	sqrtsd	-8(%rsp), %xmm0
	#NO_APP
	retq

It should be perfectly happy to keep this in the register, as it did when I wrote “x” alone, but instead it’s spilling to the stack first to get a memory operand.


#7

Agree, in your case moving to memory looks like an LLVM bug.