Why does the `mul_add` method produce a more accurate result with better performance?

photino · May 29, 2015, 3:17pm

Fused multiply-add. Computes (self * a) + b with only one rounding error. This produces a more accurate result with better performance than a separate multiplication operation followed by an add.

How is this method implemented? And how much better performance can it produce?

killercup · May 29, 2015, 3:32pm

It's implemented using LLVM intrinsics, specifically llvm.fma.f64, which is documented here.

huon · May 29, 2015, 11:52pm

That statement in the docs isn't always true: it produces a more accurate result that may be faster than separate multiplication/addition. It's not actually faster on many platforms, since getting high performance basically requires having a specific fma CPU instruction. If not it calls the libc function, which is quite a lot slower (it does a lot more work):

test fma      ... bench:       211 ns/iter (+/- 7)
test separate ... bench:         1 ns/iter (+/- 0)

#![feature(test)]
extern crate test;

#[bench]
fn fma(b: &mut test::Bencher) {
    let x = 1.5_f64;
    let y = 2.123466;
    let z = -987654.23456;

    b.iter(|| test::black_box(x).mul_add(test::black_box(y),
                                         test::black_box(z)))
}

#[bench]
fn separate(b: &mut test::Bencher) {
    let x = 1.5_f64;
    let y = 2.123466;
    let z = -987654.23456;

    b.iter(|| test::black_box(x) * test::black_box(y) + test::black_box(z))
}

vks · June 5, 2015, 12:00pm

I think we should fix the docs then. On my machine, that does have FMA as far as I know, it is still slower:

test fma      ... bench:        29 ns/iter (+/- 1)
test separate ... bench:        23 ns/iter (+/- 1)

Topic		Replies	Views
Looking for help understanding Rust's performance vs C++ community	28	7988	November 1, 2019
Efficient implementation of a * b % n for 64-bit values help	2	684	January 12, 2023
Add, Mul not implemented for f64 + usize	4	567	December 10, 2023
Fast math seems ... slower? help	16	1881	June 30, 2020
High performance operations	7	3734	January 12, 2023

Why does the `mul_add` method produce a more accurate result with better performance?

Related topics