ASM: Problem with (reg) -> rdx and "mul {c}"

I am new to rust and assembly, but for the fun of it I tried to do the factorial function in inline assembly.

My program ended in an endless loop.
Looking at the generated assembly I found the following problem:

n = in(reg) n64, // gets translated to in("rcx) n64
c = in(reg) c, // gets translated to in(rdx") c
out("rax") result, // need rax because it is used in Intel mul
later I put a value in rax and multiply with "c: mul {c}", which should be rax = rax * rdx
BUT (Intel x64) result is in rax, rdx (for 128-bit results), {c} being in rdx is now set to 0 causing an endless loop.

Why is (reg) choosing a register which is overwritten in multiplication?

Is this a bug or an expected behavior?
see 'fn factorial_asm_c'

Full code here:

use num_format::Locale;
use num_format::ToFormattedString;
use std::arch::asm;
fn main() {
    // WARNING running will end in an endless loop, factorial_asm (.exe) must be terminated via process manager

    // let max = u64::MAX;
    let start = 1;
    let end = 25;

    let mut n = start;
    loop {
        let f = factorial(n);
        let fs = f.to_formatted_string(&Locale::de);
        println!("\nFactorial RUST of {} is {}", n, fs);

        let fa = factorial_asm(n);
        let fas = fa.to_formatted_string(&Locale::de);
        print!("Factorial ASM  of {} is {}", n, fas);

        if fa != f {
            println!(" - false");
        } else {
            println!("");
        }

        let fa = factorial_asm_c(n);
        let fas = fa.to_formatted_string(&Locale::de);
        println!("Factorial ASMc of {} is {}", n, fas);

        n += 1;
        //if f > max as u128 { break;}
        if n > end {
            break;
        }
    }
}

/// calculates n!
/// results are correct between [0..=34]
/// 35 and above provide a wrong number (overflow)
pub fn factorial(n: usize) -> u128 {
    // (1..=n).product()
    let mut result: u128 = 1;
    let mut c: u128 = 2;
    while c <= n as u128 {
        result = result * c;
        c += 1;
    }
    result
}

/// calculates n!
/// results are correct between [0..=20]
/// 20 and above provide a wrong number (overflow)
pub fn factorial_asm(n: usize) -> u128 {
    let mut result: u64;
    let n64 = n as u64;
    let c: u64 = 2; // not necessary, could be "mov r15, 2"

    unsafe {
        asm!(
            "mov rax, 1",
            "2:",
            "cmp r15, {n}",
            "jg 99f",
            "mul r15",
            "inc r15",
            "jmp 2b",
            "99:",
            n = in(reg) n64,
            in("r15") c,
            out("rax") result,
            options(pure, nomem, nostack),
        );
    }

    result as u128
}

/// calculates n!
/// results are correct between [0..=20]
/// 20 and above provide a wrong number (overflow)
pub fn factorial_asm_c(n: usize) -> u128 {
    let mut result: u64;
    let n64 = n as u64;
    let c: u64 = 2;

    unsafe {
        asm!(
            "mov rax, 1",
            "2:",
            "cmp {c}, {n}",
            "jg 99f",
            "mul {c}",
            "inc {c}",
            "jmp 2b",
            "99:",
            n = in(reg) n64,
            c = in(reg) c,
            out("rax") result,
            // res = out(reg) result, // would also go in rax
            options(pure, nomem, nostack),
        );

        // compiles to
        // movq	$1, %rax
        // .Ltmp1010:
        //     cmpq	%rcx, %rdx // {c} is put in rdx
        //     jg	.Ltmp1011
        //     mulq	%rdx // rax = rax * rdx, result is in rax, rdx (for 128-bit results), {c} now is set to 0 causing an endless loop
        //     incq	%rdx
        //     jmp	.Ltmp1010
        // .Ltmp1011:
        
    }

    result as u128
}

The behavior you describe is expected. From the Rust reference - Rules for inline assembly (emphasis mine):

  • The compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.
    • This effectively means that the compiler must treat the asm! as a black box and only take the interface specification into account, not the instructions themselves.
      Runtime code patching is allowed, via target-specific mechanisms.

Also note this:

Any registers not specified as outputs must have the same value upon exiting the asm block as they had on entry, otherwise behavior is undefined.

You should let the compiler know that you clobber rdx with the mul operation (the compiler will not read that mul instruction and infer that rdx is clobbered in the asm block). Take a look here for an example.

1 Like

Thanks for your quick and elaborate reply.

My mistakes seem to sum up like this

  • using (reg) to provide a "system-indepentent" ASM-compilation, but then using mul which seems to be intel specific
  • assuming reg would then use registers not likely to be overwritten
  • assuming the often used ABI where the lower registers are caller saved

I rewrote it like this now:

/// calculates n!
/// results are correct between [0..=20]
/// 20 and above provide a wrong number (overflow)
pub fn factorial_asm_x86_64(n: usize) -> usize {
let mut result: usize;

unsafe {
    asm!(
        "mov rax, 1", // init result with 1
        "mov rcx, 2", // init counter with 2, for 2..=n
        "2:", // start multiply
        "cmp rcx, rdi", // while c <= n
        "jg 3f", // exit if c > n else multiply
        "mul rcx",
        "inc rcx",
        "jmp 2b",
        "3:", // exit
        in("rdi") n, // n as u64
        out("rcx") _,  // used as counter
        out("rdx") _, // overwritten by mul
        out("rax") result,
        options(pure, nomem, nostack),
    );
}

result

}

Interestingly, the ASM code takes twice as long as the Rust code due to overhead on calling.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.