Bitwise equality for f64

Is there a wrapper where Wrap(f64) provides impls of Hash, Eq, PartiqlEq by just interpreting the f64 as 64 bits ?

I understand float equality comparison is bad. I understand IEEE mandates NaN != NaN. My use case is:

pub struct LispVal {
  I64(i64),
  F64(f64),
  ...
}

and I want to be able to define Eq, PartialEq, Hash on it (using simple dumb bitwise equality).

Do you really want bitwise equality comparing even the NaN payload bits, or will NaN == NaN do? If it will, you can use:

But you might end up needing to define your own non-derived PartialEq or custom equality functions anyway. Lisp implementations often have several.

1 Like

I just wrote up:

#[derive(Debug, Copy, Clone)]
pub struct F64(pub f64);

impl PartialEq<Self> for F64 {
    fn eq(&self, rhs: &Self) -> bool {
        self.0.to_le_bytes() == rhs.0.to_le_bytes()
    }
}

impl Eq for F64 {}

impl Hash for F64 {
    fn hash<H: Hasher>(&self, state: &mut H) {
        state.write(&self.0.to_le_bytes());
    }
}

Is there anything obviously bad / unsafe / UB / DONT-USE-THIS / wrong with my impl ? I think it's okay because I'm literally, for PartialEq, Eq, and Hash, treating the f64 as it's 8 little endian bytes.

Perhaps you're looking for https://doc.rust-lang.org/std/primitive.f64.html#method.total_cmp?

If not, might as well use https://doc.rust-lang.org/std/primitive.f64.html#method.to_bits as simpler than needing to pick between [nlb]e.

5 Likes
  1. It is not clear to me how total_cmp is useful.

  2. to_bits() is definitely nicer than to_bytes(); I think the Hash.write_u64 is faster than Hash.write(&[u8]) <-- I think this has to call len then run a for loop.

Thanks!

That's impossible[1] if you don't use unsafe.[2]


  1. or the fault of a dependency, not you ↩︎

  2. Or a few other lingering things like the no_mangle attribute. ↩︎

1 Like

What I had in mind was:

a & b have different bit reps
a & b have different hashes (by impl of Hash)
a == b (by impl of Eq)

I guess because Hash & Eq are safe traits; even of I "violate" the expectation, UB is not allowed.

In this hypothetical situation, what is allowed to happen? It's not allowed to UB ... but what bad things are allowed to happen ?

Right, logical problems, but no UB.

You could get panics, deadlocks, insert the same thing multiple times, get random stuff back out... in colloquial terms "anything that's not UB". Less colloquially, communicating the possibilities without tying the library's hands is difficult.

Or from here,

The behavior resulting from either logic error is not specified, but will be encapsulated to the HashMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

2 Likes

Okay, so it can terminate self, infinite loop, leak memory, return wrong answer -- but it won't rm -rf / , DDOS Google, or fire off nukes ?

Since you already have to deal with converting between (wrapping) f64 and F64, it's tempting to just have type F64 = u64; and have the conversion operation be .to_bits(), but which is more clear in practice is probably subjective.

I am having similar problem for one of my not-yet-started project and haven't found a solution yet. But some things to consider:

  1. There are around 2^50 NaN bit patterns which compare as bitwise-unequal, but are hard to distinguish otherwise.
  2. Positive and negative zero compare as equal, but can be distinguished ((1/x).atan() gives different results for x = 0.0 and x = -0.0), but this is solved by bitwise comparision.
  3. It can if you program happens to contain (maybe indirectly through a library or user config) something like:
    if hash_set.contains(7.0) {
      run_command_as_root("rm -rf /");   // Can happen even if hash_set does not actually contain 7.0 when Hash and Eq implementations are wrong
    }
    
    I think all this situation around wrong implementation Eq and Hash should be called "safe UB" rather than "just a logical error, nothing to worry about".

I think you have the idea.

"Unspecified" is a common phrase for it. I think it's important to keep UB, where the compiler itself is released from its as-if chains, in its own category. Creating a way to constrain possible sources of UB is one of Rust's crowing achievements, after all.

3 Likes

I agree in general, but

something similar can be said of library code (including non-local consequences) when a logical invariant fails. The library can do anything (unless at it's behavior is least partially specified).

I just sometimes have the feeling that in the Rust world logical errors are treated much more lightly than they deserve while their range of consequences is quite similar to that of UB (only the mechanism is different).

But other than that: I just checked the docs for HashMap and found (and noticed you quoted it too) this (emphasis added by me):

The behavior resulting from either logic error [...] will be encapsulated to the HashMap that observed the logic error [...].

... which I think is the second most important (after the not-being-UB part) thing to mention. And it is much more restrictive specification than "anything that's not UB".

... but this is too off-topic now, I'm out.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.