Inline asm reloads input register between calls

I have some Rust code (see playground) that uses the asm! macro to write to an x86 port:

#![feature(asm)]

fn out_e9(value: u8) {
    unsafe {
        asm!("out 0xe9, al", in("al") value, options(nomem, nostack, preserves_flags));
    }
}


fn main() {
    out_e9(b'h');
    out_e9(b'e');
    out_e9(b'l');
    out_e9(b'l');
    out_e9(b'o');
}

The code generated on nightly looks like this:

	movb	$104, %al  // h
	outb	%al, $233
	movb	$101, %al  // e
	outb	%al, $233
	movb	$108, %al  // l
	outb	%al, $233
	movb	$108, %al  // l - this one is redundant
	outb	%al, $233
	movb	$111, %al  // o
	outb	%al, $233
	retq

On clang (see Godbolt) I get this:

static void out_e9(char value) {
    __asm__ __volatile__("outb %%al, $0xe9" : : "al"(value));
}

int main() {
    out_e9('h');
    out_e9('e');
    out_e9('l');
    out_e9('l');
    out_e9('o');
    return 0;
}
        movb    $104, %al  // h
        outb    %al, $233
        movb    $101, %al  // e
        outb    %al, $233
        movb    $108, %al  // l
        outb    %al, $233
        outb    %al, $233
        movb    $111, %al  // o
        outb    %al, $233
        xorl    %eax, %eax
        retq

Note that the code generated by the Rust compiler reloads the al register with the same value twice, whereas Clang does not (and neither does GCC).

Is there a way to get Rust inline asm to behave the same way as Clang and avoid the redundant mov?

I loaded them both in godbolt to try to get them as similar as possible:

  • C (clang 12.0.1): Compiler Explorer
  • Rust (rustc 1.57.0-nightly (5ecc8ad84 2021-09-19)): Compiler Explorer
    • note: it sometimes gets stuck with cached errors, so I had to click the refresh button within the page, for "clear cache & recompile".

The LLVM IR shows some differences:

// clang
  tail call void asm sideeffect "outb %al, $$0xe9", "{ax}l,~{dirflag},~{fpsr},~{flags}"(i8 108) #1, !srcloc !2

// rustc
  tail call void asm sideeffect "outb %al, $$0xe9", "{al}"(i8 108) #1, !srcloc !2

Thanks, the LLVM IR produced by Clang makes it clearer, and I think the original C code is wrong. Clang __asm__ does not accept literal register names like rustc, so "al" asks the compiler to make its own choice between constraint a (the a register) or l (any integer register). [In Clang, it looks like "a" constraints work the same as in GCC inline asm, and "l" is an LLVM constrant.]

Using llvm_asm! instead of asm! I can write LLVM constraints directly:

  • llvm_asm!( ... "{ax}l" ... ) matches __asm__( ... "al" ...). Both of these generate correct machine code by accident, because they reduce to the "l" constraint.
  • llvm_asm!( ... "{ax}" ... ) matches __asm__( ... "a" ... ) and asm!( ... in(reg_byte) ... ). This inline asm is more correct (the outb instruction can only use al, not any other integer register) but it causes both Clang and rustc to generate the redundant reload in the machine code.

I wonder:

  1. Why LLVM generates different machine code for "{ax}" and "l", even though it correctly chooses the al register for both
  2. If LLVM's register allocation understands the outb instruction - no, I can cause a compiler error by writing a more complicated function that causes "a" to select a register other than al

It looks like a pure LLVM issue. Presumably code generation is not as smart as it could be when physical registers assignments are used.

Compiler Explorer:

// LLVM IR
define void @hello() {
  call void asm sideeffect "nop", "{al}"(i8 120)
  call void asm sideeffect "nop", "{al}"(i8 120)
  ret void
}
// Asm
hello:                                  # @hello
        movb    $120, %al
        nop
        movb    $120, %al
        nop
        retq
1 Like