Subtle floating-point differences between C library and its Rust re-write


In an effort to spare myself and downstream users some effort, I'm idly working on re-writing ERFA C library into Rust. This project lends itself to a re-write as it has no dependencies and is relatively simple. Despite its simplicity, it is extremely important that the library's calculations are accurate, as its primary purpose is providing precise astronomical measurements.

I've already published erfa-sys on, which simply provides Rust bindings for the C library (and optionally compiles it from source). In an effort to ensure that my re-write is behaving, I'm testing my erfa crate against my erfa-sys crate, and I've spotted some subtle differences, e.g. (light edits for brevity):

double eraFaom03(double t) {
   double a;
   a = fmod(          450160.398036 +
             t * ( - 6962890.5431 +
             t * (         7.4722 +
             t * (         0.007702 +
             t * (       - 0.00005939 ) ) ) ), ERFA_TURNAS ) * ERFA_DAS2R;

   return a;

/* Finished. */

pub fn om03(t: f64) -> f64 {
    let a =
    ((          450160.398036 +
       t * ( - 6962890.5431 +
       t * (         7.4722 +
       t * (         0.007702 +
       t * (       - 0.00005939 ) ) ) )) % ERFA_TURNAS ) * ERFA_DAS2R;
fn test_eraFaom03() {
    for t in [0.1, 1.2, 12.34] {
        let result = eraFaom03(t);
        let expected = unsafe { erfa_sys::eraFaom03(t) };
        // approx 0.5.0
        approx::assert_abs_diff_eq!(result, expected);
---- aliases::tests::test_eraFaom03 stdout ----
[erfa/src/aliases/] t = 0.1
[erfa/src/aliases/] t = 1.2
thread 'aliases::tests::test_eraFaom03' panicked at 'assert_abs_diff_eq!(result, expected)

    left  = -0.6268518749398361
    right = -0.6268518749398405

', erfa/src/aliases/
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Obviously in at least one case, there is no difference, but in another there's a few different mantissa bits. I've just re-compiled the C library and I'm using up-to-date Arch with an x86-64 CPU. I've tried looking at the assembly but (once again) my eyes glaze over with uncertainty and confusion.

Seeing as the code that comprises these two functions is relatively simple, I guess I'm surprised that differences are possible. Would anyone be able to clarify what is going on here, and whether I should be worried? I can whack an epsilon onto my tests, but I'd like to understand what's going on. Thanks!

You're likely compiling the C code for x86 rather than x86-64. x86 doesn't support exact IEEE-754 arithmetic by default, unless you use SSE2 instructions for floating point. Pass flags -mfpmath=sse -msse2 to gcc, or compile to x86-64.

Thanks for the reply, unfortunately this made no difference. The file gets compiled with:

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I.. -I.. -march=native -O3 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -mfpmath=sse -msse2 -MT faom03.lo -MD -MP -MF .deps/faom03.Tpo -c faom03.c -o faom03.o >/dev/null 2>&1

(You can see I've added your flags too)

I'm using a AMD Ryzen 3900X. Also, I'm happy to be wrong, but I don't see how this would be compiling for a 32bit platform.

You can use to see what instructions are generated.

1 Like

Try outputting the intermediate values. You should be able to write to stdout from both sides (don't forget to flush that output after each write).

Even the most precise constant in the code is given to 12 significant figures only. The error in your example is around 4 * 10-15. If the constants are less precise than the numerical error to begin with, I don't think it's reasonable to worry about such errors.

If you want exact computations, you won't use floating point anyway, but a bignum/arbitrary-precision library.


I don't know about floating point arithmetic, but I recommend using QuickCheck to test that two implementations produce the same result. Rather than iterating over a few manually-specified values of t, I would have QuickCheck search for values of t that make the assertion fail.

1 Like

Thanks for reminding of this -- after using Godbolt, I could see that the instructions for the Rust and C implementations were virtually the same (when both are using -O3). So, after recompiling the C library without -march=native and running tests without RUSTFLAGS="-C target-cpu=native", they pass. So I suppose my original issue arises from one or both compilers looking to use CPU extensions and perhaps doing float operations in another order. I guess I had naively assumed that floating-point results aren't affected by using CPU extensions. Therefore these tests should use epsilons!


This is a really good point! I'll consider this next time I'm being pedantic about precision. For context, the science experiment I'm working on is attempting to detect a distant, weak cosmological signal buried under noise, so I'm trying to be extremely conscious of floating-point errors.

1 Like

Thanks for the suggestion. This was actually my first thought for testing the libraries, but I found that for more complicated functions, quickcheck hangs. I have no idea what's going on with it, but I might revisit the issue later.

This is a good suggestion, but I don't think it will work in the end -- as I replied just a minute ago, I think the issue is that CPU extensions are used (or not used) consistently between the C and Rust code, and injecting print statements means that the machine code will be different as it can't be as tightly optimised. Fortunately these functions are simple enough that inspection with Godbolt can highlight differences.

That's certainly a good thing! In this case, do consider using an AP library, if one is available, or try to translate the C code to Rust using one of the lower-level AP crates, such as rug (warning: GPL ahead!)

Normally I would agree with you, but for a low level floating point library it's often important to have bit-for-bit identical results. This is especially important when you'll be working with chaotic systems or high levels of precision because differences can compound.

Sure, floating point operations introduce errors, but they're still deterministic and an identical series of operations in C should give the same result as the Rust equivalent.


I get the first answer from the C code and from Rust code and can't reproduce the second answer. I suspect it's the C compiler that is somehow giving you the wrong answer.

Not sure what you mean by "CPU extensions" but it seems like it's breaking IEEE-754 semantics which are deterministic. This shouldn't be happening on x86-64 so something fishy is going on.

Optimizations should not affect the results in IEEE-754 compliant compilation mode.


Given the mention of 32 bit platform, it sounds to me like one version is using SSE2 and thus IEEE binary64 floating point throughout, while the other is using x87 and thus x87 Extended Precision 80 bit floating point (also an IEEE compliant floating point representation, FWIW) in registers, truncating to binary64 when stored.

I'd expect slightly different results in this case, because while the computations are deterministic, one of them is using a different format for in-register intermediates. One of the reasons that there's this myth that floating point is non-deterministic is the common behaviour of the x87 [1], where the only time you truncate to the chosen format (binary32, binary64) is on store. This results in register allocation for x87 being sufficient to change final results - if your memory format is the IEEE binary64 or binary32 format, not the x87 extended format, each time the compiler spills to memory, you lose 11 bits from the fraction part. In the extreme, you can also get a case where the value computed using only IEEE binary64 is an infinity, a NaN or underflow (due to exponent overflow in an intermediate step), but the x87's extended precision allows it to compute an answer that fits in a binary64 (with an intermediate whose exponent requires the extra 4 bits the x87 supplies).

[1] The default setting for the FPU control word sets the Precision Control field to Double Extended. In theory, you can set this to Double for the entire process, but that affects all use of the x87 in a given process. There's no instruction to truncate a value to binary64 or binary32 inside a register if the PC field is Double Extended; to do this requires a store then a load in all cases.


It's not the representation that isn't IEEE compliant with -mfpmath=387, it's the silent conversions between different precisions that aren't IEEE-754 compliant. The IEEE-754 standard specifies how arithmetic operations work, and what gcc is doing with -mfpmath=387 violates that.


For legacy x86 processors without SSE2 support, and for m68080 processors, GCC is only able to fully comply with IEEE 754 semantics for the IEEE double extended (long double) type.

When I said "deterministic", I meant "determined by the sequence of operations in the source code". Which is not the case for gcc compiling with -mfpmath=387. If it depends on register allocation choices in the compiler, then it's not "deterministic" in that sense.

However the OP has said that they are compiling to x86-64 with -mfpmath=sse (which is the default setting on x86-64 anyway), so none of that matters. Still, I am getting the second (wrong) result when I compile with -mfpmath=387, so I still suspect that somehow x87 is involved. Maybe libm is compiled with different settings on that system...?


I get what you mean now - the x87 is capable of being fully IEEE 754 compliant with a bit of care in codegen, but for performance reasons most compilers don't take that care, but instead permit themselves to convert between different precisions at compiler-chosen points in time.

When I said "the myth that floating point is non-deterministic", I was specifically referring to the idea many developers appear to have that the behaviour of GCC on x86 with -mfpmath=387 is the behaviour specified by IEEE 754, and not a non-compliant performance-oriented oddity.

I'd agree with your diagnosis that libm is compiled with -mfpmath=387, incidentally, as that's what most 32-bit Linux distributions choose (since their target audience includes people using the Pentium II or III chips that predate SSE2).

1 Like

I'm unsure how to determine how libm was compiled. On archlinux, Is it part of glibc? The PKGBUILD doesn't make anything obvious. In any case, I haven't built my own glibc or libm, so my results come from whatever up-to-date archlinux provides.

For the purposes of this thread, I'm satisfied knowing that floats are even more complicated than I thought and I probably shouldn't be too pedantic about slightly different results, but feel free to continue discussing this.

Oh yeah, if you need cross-OS bit precise floating point determinism, you can't be using libm and need to ship your own copies.

Does this apply to Rust as well as C? (I think they use the same C libm by default?) If so, would using Rust's libm suffice for determinism in Rust?