anatol
February 15, 2017, 8:23pm
1
I am trying to understand inline assembly feature in Rust and as an example I want to write a function that manipulates floating point at x86 FPU. Something simple as square-root function. The FPU instruction format is sqrtsd %xmmsR, %xmmsR
. So I created a function:
#![feature(asm)]
fn sqrt_asm(i: f64) -> f64 {
let ret: f64;
unsafe { asm!("sqrtsd $0, $0"
: "=r"(ret)
: "0"(i)
: ) }
return ret;
}
fn main() {
println!("{}", sqrt_asm(15.2));
}
I see that input/output double parameters are passed using xmms0
, what is good. But it is not clear for me how to make asm accept variable and output to the same xmms
register.
Does anyone have an experience with asm! and FPU instructions?
anatol
February 15, 2017, 10:40pm
2
After several blind tries I was able to make my function working:
fn sqrt_asm(i: f64) -> f64 {
let ret: f64;
unsafe { asm!("sqrtsd %xmm0, %xmm0"
: "={xmm0}"(ret)
: "{xmm0}"(i)
: ) }
return ret;
}
But I am not sure if it is the best way to implement this kind of functions. In particular
is it going to work if compiler decides to use register other than xmm0
to pass function arguments?
is there a way avoid hardcoding xmm0
register? I wish if compiler had a freedom to choose any available xmms
register.
And here is the function asm code
0000000000006160 <_ZN4sqrt8sqrt_asm17h7dee47c5a631820dE>:
6160: f2 0f 11 44 24 e8 movsd %xmm0,-0x18(%rsp)
6166: f2 0f 10 44 24 e8 movsd -0x18(%rsp),%xmm0
616c: f2 0f 51 c0 sqrtsd %xmm0,%xmm0
6170: f2 0f 11 44 24 f0 movsd %xmm0,-0x10(%rsp)
6176: f2 0f 10 44 24 f0 movsd -0x10(%rsp),%xmm0
617c: c3 retq
617d: 0f 1f 00 nopl (%rax)
0000000000006180 <_ZN4sqrt4main17h1453d063b515cb9cE>:
6180: 48 81 ec 88 00 00 00 sub $0x88,%rsp
6187: f2 0f 10 05 b1 75 03 movsd 0x375b1(%rip),%xmm0 # 3d740 <_fini+0x24>
618e: 00
618f: 48 8b 35 7a a1 24 00 mov 0x24a17a(%rip),%rsi # 250310 <_ZN4sqrt4main15__STATIC_FMTSTR17h31b80c74e0c53704E>
6196: 48 8b 15 7b a1 24 00 mov 0x24a17b(%rip),%rdx # 250318 <_ZN4sqrt4main15__STATIC_FMTSTR17h31b80c74e0c53704E+0x8>
619d: 48 89 54 24 18 mov %rdx,0x18(%rsp)
61a2: 48 89 74 24 10 mov %rsi,0x10(%rsp)
61a7: e8 b4 ff ff ff callq 6160 <_ZN4sqrt8sqrt_asm17h7dee47c5a631820dE>
61ac: f2 0f 11 44 24 70 movsd %xmm0,0x70(%rsp)
61b2: 48 8d 7c 24 78 lea 0x78(%rsp),%rdi
It is not clear for me what compiler is doing with moves at 6160, 6166, 6170, 6176. Is there a way to avoid it?
And ideally compiler should just inline this 1-operand function.
According to the documentation (scroll down to x86) you can specify an XMM register using =x
cuviper
February 15, 2017, 10:58pm
4
Are you compiling with cargo build --release
? or rustc -O
? If not, you'll see a lot of stack setup like this.
I think x
will give the SSE constraint you want, and you don't even have to force it to be the same register.
fn sqrt_asm(i: f64) -> f64 {
let ret: f64;
unsafe {
asm!("sqrtsd $1, $0"
: "=x"(ret)
: "x"(i)
: )
}
return ret;
}
This still gives me just:
_ZN8rust_out8sqrt_asm17hf6e6ae11ad179d42E:
.cfi_startproc
#APP
sqrtsd %xmm0, %xmm0
#NO_APP
retq
You should also be able to write "xm" for the input, allowing either register or memory, but it seems LLVM always chooses memory in this case.
anatol
February 15, 2017, 11:27pm
5
Thank you for your help folks.
According to the documentation (scroll down to x86) you can specify an XMM register using =x
Thank you. I did not realize that rust/llvm assembler format is compatible with GCC one. GCC seems has a nice documentation.
Are you compiling with cargo build --release? or rustc -O?
D'oh, I use rustc
and adding -O
turns on a lot of optimizations. Now sqrtsd
is inlined exactly as I expect.
it seems LLVM always chooses memory in this case
In this case LLVM avoids loading input argument from memory into xmm0
and eliminated one operation.
Ok, here is the final example that works exactly as I expect:
#![feature(asm)]
#[inline]
fn sqrt_asm(i: f64) -> f64 {
let ret: f64;
unsafe { asm!("sqrtsd $1, $0"
: "=x"(ret)
: "xm"(i)
: ) }
return ret;
}
fn main() {
println!("{}", sqrt_asm(15.2));
}
and then rustc -O sqrt.rs
cuviper
February 15, 2017, 11:56pm
6
Using memory can be an advantage in some cases, but not always. When I tested this, I was actually forcing it not to inline, so I could see the function by itself.
#[inline(never)]
fn sqrt_asm(i: f64) -> f64 {
let ret: f64;
unsafe {
asm!("sqrtsd $1, $0"
: "=x"(ret)
: "xm"(i)
: )
}
return ret;
}
_ZN8rust_out8sqrt_asm17hf6e6ae11ad179d42E:
.cfi_startproc
movsd %xmm0, -8(%rsp)
#APP
sqrtsd -8(%rsp), %xmm0
#NO_APP
retq
It should be perfectly happy to keep this in the register, as it did when I wrote "x" alone, but instead it's spilling to the stack first to get a memory operand.
anatol
February 16, 2017, 12:08am
7
Agree, in your case moving to memory looks like an LLVM bug.