I'm trying to implement a function (a * b) % n
for 64-bit values. x64 architecture has support for this computation, it's basically two instructions: mul
and div
and some register shuffling. I expected u128
data type to support it with proper optimizations. Rust compiler utilizes mul
for 64*64 -> 128-bit multiplication, but it calls some function for division instead of using a single div
instruction (I guess this function does 128-bit%128bit->128bit, but I really only need 128-bit%64-bit->64-bit). Is it possible to rewrite this code better? I understand that division is a very slow instruction and single function call won't hurt (if it uses div
inside, of course), but may be I could write it even better anyway. Compiler could be smart enough to understand that n128 is not really 128-bit value and I casted it only to satisfy type checker, so it doesn't have to perform a real 128-bit division. I'm using https://rust.godbolt.org/ with -O flag to check assembly output.
pub fn mulmod(a: u64, b: u64, n: u64) -> u64 {
let a128 = a as u128;
let b128 = b as u128;
let n128 = n as u128;
let c128 = a128 * b128 % n128;
return c128 as u64;
}
Here's generated assembly for reference:
example::mulmod:
pushq %rax
testq %rdx, %rdx
je .LBB0_2
movq %rdx, %r8
movq %rsi, %rax
mulq %rdi
xorl %ecx, %ecx
movq %rax, %rdi
movq %rdx, %rsi
movq %r8, %rdx
callq __umodti3@PLT
popq %rcx
retq
.LBB0_2:
leaq .Lpanic_loc.2(%rip), %rdi
callq core::panicking::panic@PLT
ud2
and I'm talking about __umodti3
call.