Total order for floats?

I am now thinking that maybe I need to loop through all the data to verify it before I start, throwing out any NaNs. The trouble is that then I still can not use cmp() for genericity because it will still be floats. Neither can I use partial_cmp() because that will be just wasting time, unnecessarily checking for None all the time. Nor total_cmp() because all that bit checking will now be superfluous as well. What a mess!

I guess if I go into all that data verification trouble, I may as well wrap all the good ones in that ordered_float, as H2CO3 suggested? But I still have to detect when my generic T is specifically f64. How do I do that? It is at times like this that I regret I ever started on Rust.

Type conversion (from float to ordered_float::OrderedFloat for example) should be done by who is responsible to handle the erroneous data. Also, you should declare the restriction by API through choice of the types.
If your library (or application) should accept and care "a user produces the data by careless calculations" (i.e. have some explicit and intentional specification for NaN handling), then receiving f64 and converting it inside the library might be sensible.

But if I were you, I'll choose to accept iterators or arrays of ordered_float::NotNan (rather than primitive float types) because the library should not be responsible for "careless calculation" done by the library's user. It should be noticed and fixed on the user's side.
If the data comes from data file for example, then rejecting the invalid data is the responsibility of the deserializer (file loader), not of the data processing library.

By accepting only NotNan floats, you may get these benefits:

  • Users can notice the possibility of the invalid data more earlier (than the runtime error on the library).
    • This can prevent incidents such as "oh no, I carelessly fed the invalid data by accident, and the 2 hours calculation generated useless garbage!"
  • Users can omit or optimize the NaN check if they know the data is not NaN.
    • Usually the users know more about the data than the library.
  • If the inputs are adequately typed, runtime errors cannot be happen for pure calculation (without I/O or something).
    • The type guarantees the all the input is valid.
  • Users can know the restriction on the input data from the type, without thoroughly reading reference document.
    • If users see the function receiving f64, they would worry about the treatment of NaNs. However, if users see the function receiving NotNan, they immediately understand NaN does not exist in the world of such function.

Validations should be done by the component responsible about the data, and components should express their specifications and guarantees through types.
Doing so, things will be simple and easy.

8 Likes

The problem with this is that arithmetic, and other useful floating-point functions, aren't actually closed over the real numbers. Every time you calculate e.g. a division, tangent, or root there's a risk of leaving the domain of real numbers; adding a check and branch every time one of these operations happens will lead to slower and harder-to-understand calculations.


This misrepresents my position. I was making the argument for NaNs to propagate through to the result of the analysis, instead of the analysis pretending that it has a good result in the presence of NaNs.

With infectious non-signaling NaNs, the programmer can choose when it is most efficient to check for bad data; in some situations that will be early and in others it will be late.

6 Likes

You can use noisy_float for that.

I experimented with a crate that enables an even stronger requirement (hardware floating point exceptions) and lets you use normal f32 and f64. But it has some safety bugs in the signal handler...

Anyway, yeah. NaN is mostly a footgun and many use cases would prefer to explode at the site of the calculation error than charging forward blindly, corrupting all of your floating-point state.

3 Likes

In R, median returns NA if any value is NA; the user has to opt into removing NA. (One value of NaN is reserved to represent missing data.) This is what I'd want in a statistics library.

3 Likes

Then receiving f64 or &[f64] or I: Iterator<Item=f64> or something like that is completely sensible.
(Though you can still think about receiving Option<NotNan<f64>>s if it could be easy for users or providing both versions that accepts and rejects NaNs (the former can be implemented using the latter).)

If you receive &[f64], you can first check if there are any NaNs, and if there are none (no pun intended) then you can transmute &[f64] to &[NotNan] since NotNan is marked #[repr(transparent)].
Yeah this is unsafe operation but I think it is definitely worth doing since you can do this without re-collecting into newly allocated Vec.

Similarly, if you receive Vec<f64>, then I'll transmute &mut [f64] into &mut [NotNan<f64>].

If you receive iterator, then... what to do would depend on the implementation detail. Iterator types has many methods to modify, filter, early-break, aggregate and more. Some of them might be useful to handle NaN in some context.

1 Like

It is a shame that none of this NotNan will work without dependence on an external crate.

Yeah in my case I often wish NonMaxUsize to be in std, but std library of Rust intentionally tend to be small especially the feature is unrelated to the language semantics and interaction to the system (for example, regex is not in std!) so it is a fat chance...
I think we should be used to this culture of creating and depending on small and simple external crate.

3 Likes

Total Order for Floats

No one here, that I can see, has observed that equality does not have a meaning for real numbers, only rational numbers

1.999999 recurring is indistinguishable, but different from 2.0

Floats are just bitfields so hacks are possible, but for the underlying thing being modeled total order Is infeasible

Are you sure your model has not gone off the rails?

While this is true, floats are not reals, they have limited precision and limited range. There is a well defined concept of next/previous float (with the exception of NaNs and +/-inf).

So while your argument is true for uncountable infinite sets such as the reals, it is not true for floats.

I would argue that floats do not really model the reals, but are their own thing. It is a common misconception to consider the floats as "approximately reals". They have too much weirdness for that.

5 Likes

All of this is false.

Equality between real numbers is well defined.

1.999... = 2

Calculating the median of floating point numbers using comparisons is a valid thing to do. It is also numerically stable.

10 Likes

In my naive mind I imagined that defining A == B <==> !(A<B or A>B), would be practical and workable in most situations.

I am tempted to put up with the inconvenience of a wrapper type and use the following. It avoids having to pull in an external crate:

use core::cmp::{Ordering,Ordering::*};
#[derive(PartialOrd,PartialEq)]
struct Ordf64 { v:f64 };
impl Eq for Ordf64 {};
impl Ord for Ordf64 {
    fn cmp(&self, other: &Self) -> Ordering {
        (self.v).total_cmp(&other.v)
    }
}

It is a pity that this was not done in std from the start, making floats automatically Ord. Now we have to live with wrapping and unwrapping for ever?

As to whether float data should be always checked for presence of NaNs is a somewhat separate issue. I guess if a user deliberately leaves them in, there may be some edge cases where that might actually be useful?

It's conformance to a spec most the world runs on, not something that was done in a vacuum.

10 Likes

Granted, yet 'most of the world' is not always right :slight_smile:
Here is the code in more detail, that will fix Rust to do something sensible:

#![allow(unused)]
use core::cmp::{Ordering, Ordering::*};
use core::ops::{Deref, Neg};

pub struct Ordf64 {
    f: f64,
}

impl Ordf64 {
    pub fn new(value: f64) -> Self {
        Ordf64 { f: value }
    }
}
impl Deref for Ordf64 {
    type Target = f64;
    fn deref(&self) -> &Self::Target {
        &self.f
    }
}
impl Neg for Ordf64 {
    type Output = Self;
    fn neg(self) -> Self::Output {
        Ordf64::new(-*self)
    }
}
impl PartialOrd for Ordf64 {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some((*self).cmp(other))
    }
}
impl PartialEq for Ordf64 {
    fn eq(&self, rhs: &Ordf64) -> bool {
        (*self).cmp(rhs) == Equal
    }
}
impl Ord for Ordf64 {
    fn cmp(&self, other: &Self) -> Ordering {
        (*self).total_cmp(other)
    }
}
impl Eq for Ordf64 {}

fn main() {
    let inf: f64 = f64::INFINITY;
    let neginf: f64 = f64::NEG_INFINITY;
    let nan: f64 = f64::NAN;

    assert!(inf > f64::MAX);
    assert!(neginf < f64::MIN);
    assert!(!(nan > inf));
    assert!(!(-nan < neginf));

    let ordinf = Ordf64::new(inf);
    let ordneginf = Ordf64::new(neginf);
    let ordnan = Ordf64::new(nan);

    assert!(ordnan > ordinf);
    assert!(-ordnan < ordneginf);
}

If you mean some newtype ala Wrapping<_>, I could see such a thing being part of core.[1] There's times I would have used it.

But f32 and f64 aren't going to change their PartialOrd implementations to be non-compliant (and thus can't implement Ord);[2] that would be a breaking change, not a fix.


  1. Not that I have any authority in such a decision. โ†ฉ๏ธŽ

  2. and similarly for PartialEq/Eq โ†ฉ๏ธŽ

7 Likes

You are describing precisely the kind of newtype wrapper that you were sharply opposed to in your previous posts, when I pointed out that such implementations (eg. ordered_float) exist.

What gives?

1 Like

But what would it mean to say A<NaN or A>NaN? This does not fix the problem.

Hmm... I can think of one...

Let's say your input data is billions of floats and your calculation takes months on a billion dollar super computer. At the end you get a result of NaN. Excellent you know your calculation went wrong somewhere or had faulty input data :slight_smile:

I think that would be a world of pain.

Imagine part of your calculation produces a NaN.
Then your calculation continues conditionally branching on a comparison against that NaN. Say:

    if x < NowNaN {...} else {...}

If floats were Ord you would get "true" or "false" out of the comparison. The fact that your calculation has gone wrong is now lost. The result is now something entirely random and fictitious.

You already do without Ord. You get the ieee wacky behavior, throwing code that's not aware of it into chaos. These all evaluate to false:

fn main() {
    println!("{}", 0.0 < f32::NAN);
    println!("{}", 0.0 <= f32::NAN);
    println!("{}", 0.0 > f32::NAN);
    println!("{}", 0.0 >= f32::NAN);
}

I was not sharply opposed, I would just prefer not to rely on an external crate, forcing my users to it too.
Also, doing all that wrapping. However, the wrapping at least, now seems inevitable. What gives? This: implementing the core functionality myself.

Nonetheless, thank you for pointing out ordered_float, it has helped to inspire me to this solution.
I am only learning as I go along....

Damn!

Interesting this. rust analyser does not agree:

assert!(0.0 < f32::NAN);

produces:
ยดยดยด
incorrect NaN comparison, NaN is not orderable
ยดยดยด

So it does the same a Javascript and no doubt other languages. Which is probably a good thing.