Comparing floats for equality

jplwill · January 22, 2021, 7:37pm

I understand why Rust doesn't let you use == with floating point numbers. Comparing the results of two floating point computations for equality is foolish.

BUT!

I've got some code that's working with data coming in over a network channel, and I want to know whether the f64 I got this time is the same or different than the f64 I got last time. Did I get the same bit pattern, or not? But by the time I get it, it's already been presented to me as an f64.

I'm current checking (val1 - val2).abs() < f64::EPSILON, which really puts it on entirely the wrong footing.

How do I easily check whether the bits are the same?

cole-miller · January 22, 2021, 7:40pm

val1.to_ne_bytes() == val2.to_ne_bytes()

or

val1.to_bits() == val2.to_bits()

jplwill · January 22, 2021, 7:42pm

Excellent! Thank you!

ZiCog · January 22, 2021, 7:53pm

So you are suspecting corruption of your data due to network noise/errors or whatever.

So riddle me this:

If the transmitting end intentionally sends different values of that f64, but the second one happens to get corrupted to be the same as the first one, how will you know a network error occurred?

Really, I think your detection of network errors should be handled by other means. Checksums, error correcting codes, etc. That f64 should just be a bunch of bits as far as the network is concerned.

Or am I assuming your intension wrongly?

cole-miller · January 22, 2021, 8:08pm

That's exactly what @jplwill asked about: how to compare two f64 bit-for-bit. And that's what val1.as_bits() == val2.as_bits() does.

~~By definition, if the bits sent and received are the same, there is no data corruption.~~ (ignore this, I was misreading)

quinedot · January 22, 2021, 8:10pm

It does allow it.

ZiCog · January 22, 2021, 8:17pm

In a communication, over whatever medium, the receiving end cannot know if the bits it gets are the same as the transmitting end sent.

Edit: Unless the receiver and transmitter are the same device shouting down a loop back. But then, why would we be doing that?

One can become more confident that the correct thing was received by adding checksums, error correcting codes, sequence numbers, ever more complex protocols with retries and so on.

In short, on receiving a bunch of bits as an f64, one cannot tell if that is what was sent or not, or if it is supposed to be the same or different value as sent last time.

2e71828 · January 22, 2021, 8:49pm

Not necessarily. For all we know, the check could be to determine whether or not to re-run some calculation that uses this f64 as input.

jplwill · January 22, 2021, 8:53pm

Yeah; I'm not trying to check for errors at all. I"m monitoring a system, from which records of data come in every few seconds; by the time I get them, they've already been broken out into individual values. What I want to do is check whether whether I've received new data or whether it's just the same old data, i.e., which fields have changed.

jplwill · January 22, 2021, 8:56pm

D'oh. I jumped from PartialEq isn't implemented to "can't use == with floats". My bad.

mbrubeck · January 22, 2021, 8:58pm

If your data might include NaN values, then you may need special code to compare them. Otherwise, you can use ==.

quinedot · January 22, 2021, 9:50pm

That would be correct, but PartialEq is implemented. (Eq is not.)

ZiCog · January 23, 2021, 1:18am

I see. Sounds reasonable.

A potential problem with that plan is that it can happen that the same floating point calculation on the same input values could have different results. For example if the calculation were performed by some parallel running threads that perform the operations in a different order, depending on the timing of the threads.

This is likely not an issue for you but personally I would feel happier adding some meta-data to your messages to indicate that it's a new calculation and/or the input data the calculation is based on has changed. A simple message sequence number would do.

Michael-F-Bryan · January 23, 2021, 9:31am

This sounds like one of the classical thought experiments around networking; no matter how many times you re-transmit the message you can't be sure that there were no networking errors just by comparing the payloads (e.g. because every message could have been corrupted in the same way).

Do you know if it has a name?

As you've pointed out, the correct way to handle transmission issues is to use checksums/signatures to detect errors and re-transmit or error-correcting codes for automatically resolving transmission errors.

2e71828 · January 23, 2021, 10:29am

It seems related to the Byzantine generals problem, which shows that confirmation messages aren’t good enough to establish a consensus: In any such protocol, the “last” message is a single point of failure.

ZiCog · January 23, 2021, 10:41am

I think this comes under the notion of a "transaction". Atomicity (database systems) - Wikipedia A far as I understand the best one can do is ensure that either the transaction happens correctly or nothing happens at all.

To my mind the Byzantine generals problem is tacking an even bigger problem. Namely how to arrange for an arrangement of compute nodes and connections such that the system as a whole is guaranteed to function correctly in the face of one or more errors. Where "errors" can include actors that are deliberately trying to confuse the system to make it fail.

If I recall correctly the result is that one can ensure a system with N faulty nodes/connections can be made to work only if it has 3N + 1 fully connected nodes.

Fun fact: Even Fly By Wire systems, as in the Boeing 777, do not meet the Byzantine General's criteria for fault tolerance.

droundy · January 23, 2021, 3:34pm

That sounds to me like a bug. Best to avoid buggy code rather design around it.

cole-miller · January 23, 2021, 3:38pm

Sorry, I was misreading what you wrote there.

ZiCog · January 23, 2021, 6:10pm

As far as I know it is not a bug. It is a fact of life when working with floating point numbers in threads. Floating point arithmetic is not commutative. Changing the order of execution of operations can yield different results. The non-deterministic timing of thread scheduling can cause operations to be reordered and hence yield lightly different results.

See for example here: https://blogs.mathworks.com/loren/2009/12/04/comparing-single-threaded-vs-multithreaded-floating-point-calculations/

And see many other discussions on the net re: floats, threads, and results.

See also: "What Every Computer Scientist Should
Know About Floating-Point Arithmetic": https://docs.oracle.com/cd/E19957-01/800-7895/800-7895.pdf

Actually there was a long discussion about this here only days ago.

H2CO3 · January 23, 2021, 6:32pm

It is not; a friend of mine is doing his PhD in Computer Science in the topic of "reproducible computations". I can assure you it's not as simple as "if your code is correct, you'll get the same result every time".

Topic		Replies	Views
Need help in understanding floating-point total ordering added in 1.62.0 help	7	646	August 16, 2023
Why are float equality comparsions allowed? help	8	4014	September 6, 2022
How is the principle in Rust to compare two f32 numbers? help	37	3184	December 27, 2023
Bitwise equality for f64	13	1027	April 15, 2024
Assert_eq!() giving weird results with f32/f64 on macOS! help	25	721	January 8, 2026

Comparing floats for equality

Related topics