Fused multiply-add. Computes (self * a) + b with only one rounding error. This produces a more accurate result with better performance than a separate multiplication operation followed by an add.
How is this method implemented? And how much better performance can it produce?
That statement in the docs isn't always true: it produces a more accurate result that may be faster than separate multiplication/addition. It's not actually faster on many platforms, since getting high performance basically requires having a specific fma CPU instruction. If not it calls the libc function, which is quite a lot slower (it does a lot more work):
test fma ... bench: 211 ns/iter (+/- 7)
test separate ... bench: 1 ns/iter (+/- 0)
#![feature(test)]
extern crate test;
#[bench]
fn fma(b: &mut test::Bencher) {
let x = 1.5_f64;
let y = 2.123466;
let z = -987654.23456;
b.iter(|| test::black_box(x).mul_add(test::black_box(y),
test::black_box(z)))
}
#[bench]
fn separate(b: &mut test::Bencher) {
let x = 1.5_f64;
let y = 2.123466;
let z = -987654.23456;
b.iter(|| test::black_box(x) * test::black_box(y) + test::black_box(z))
}