For a scientific computing project I make extensive use of quadruple precision floats (f128) and these computations are the hot parts of the code. In Rust, there is the f128 crate that provides a wrapper around the quadmath
extension of gcc
to provide partially hardware accelerated quadprec computations.
The low-level operations on f128 numbers in C, __addtf3
etc, are wrapped around using the Wrapper
type in f128.c
of the f128
crate. Calling the wrapper functions f128_add
, etc. causes overhead of about a factor 1.5 to a factor 2, as can be seen from this flamegraph:
In C/C++
, this substantial loss in performance can be mitigated by compiling the C
library with lto
:
gcc -O3 -flto -lgfortran -lquadmath -Bstatic -c f128.c
gcc-ar crf libf128.a f128.o
g++ -O3 test.c libf128.a -flto -lquadmath -o test
where test.c
is a benchmark script:
#include <quadmath.h>
#include <stdio.h>
typedef __float128 f128;
typedef union _Wrapper {
f128 value;
unsigned __int128 dat;
char dat_alt[16];
} __attribute__ ((aligned (16))) Wrapper;
Wrapper f64_to_f128(double);
void f128_to_str(Wrapper, int, char*, const char*);
Wrapper f128_add(Wrapper*, Wrapper*);
Wrapper f128_sub(Wrapper*, Wrapper*);
Wrapper f128_mul(Wrapper*, Wrapper*);
Wrapper f128_div(Wrapper*, Wrapper*);
int main() {
Wrapper a = f64_to_f128(2.);
Wrapper b = f64_to_f128(3.);
Wrapper c = f64_to_f128(4.);
Wrapper d = f64_to_f128(5.);
for (long int i = 0; i < (long int)10000000; i++) {
a = f128_add(&a, &b);
a = f128_sub(&a, &c);
a = f128_mul(&a, &d);
a = f128_div(&a, &c);
}
printf("%f", a);
return 0;
}
I am trying to achieve a similar performance boost in Rust, but I am struggling and wondering if it's even possible since Rust compiles with LLVM and we need to use g++ instead of clang for the quadmath
extension.
I tried adding .flag("-flto")
to the f128 crate build script, but that causes linking errors (presumably because the LLVM linker cannot read g++ LTO info). Adding .flag("-ffat-lto-objects")
does restore compilation but only because LLVM can now opt to not use LTO.
Does anyone know a solution?