Total order for floats?

I was pleased to see total.cmp() introduced in Rust 0.62.
Unfortunately, the way it was done does not really help me. Unless I am missing something?
I have lots of generic methods on T:PartialOrd, that must include working on f64.
I was hoping to be able to promote them to T:Ord, leading to more efficient and safe code.
However, f64s still fail as 'not Ord', just as before. Regardless of the fact
that it may now be possible to call total.cmp() on them.

Is there any way of getting round this? To my mind, you should have at the same time promoted floats to the Ord trait, to join all the other numeric types. It would make life a lot easier, all around.

That's impossible to do correctly, because NaN != NaN, which violates Ord (and Eq).

Use ordered_float.

1 Like

Yes but I have Vecs of thousands of floats and I would rather avoid having to wrap and unwrap each of them individually. It is not something that should have to be done in any sensible programming language.

It would be easier to redefine cmp(), so that it treats NaNs as equal. Which was kind of done in total_cmp with NaNs of the same kinds, but it does not include working on other numeric types, unfortunately.

What prevents you from doing the (un)wrapping in a loop? You certainly don't have to do it one-by-one. Or just work with the totally-ordered types from the beginning, and then you won't have to wrap/unwrap anything. And since the totally-ordered types are simple newtype wrappers with the same layout as the floats themselves, I'd expect the wrapping to be optimized away completely, but at least the allocation will almost surely be reused.

That would preclude the compiler from emitting the single, hardware-intrinsic comparison instruction, and every comparison would then need a branch. That would penalize the performance of all code, even code that does not need the total ordering.

It's unfortunate that floats are not totally ordered, but that's the reality we have to deal with. CPU manufacturers and IEEE-754 gave us quick partial ordering, but no quick total ordering.

9 Likes

You can't redefine cmp because that would break any existing code relying on the current semantics. Breaking backward compatibility is a deal breaker, so this will never happen.

And to expand on the efficiency of cmp vs total_cmp, consider their output, total_cmp uses considerably more instructions: Compiler Explorer

4 Likes

I have noticed though that total_cmp() runs faster than partial_cmp(), that is one reason why I want to solve this and use Ord traits throughout. Probably due to the checking of the Options values, necessary with the partial_ord output, which you are not counting in your instructions?

How exactly did you measure this impact?

If you're sorting finite floats, using the key from total_cmp lets you use radix-sorting, which is much faster than comparison-based sorting.

2 Likes

Can you say more about what your actual application is? Do you have to deal with NaN?

One of my applications is finding medians of Vecs of data, see crate medians. Previously, I relied on PartialOrd and partial_cmp(), just so that f64s could be included in the generic data end-types. However, if a user produces the data by careless calculations, such as divisions by zero, I can not rule out that there may be some NaNs in it. Therefore, I would much rather use Ord and cmp or total_cmp().

With two objectives in mind:

  1. to still obtain reasonable results, without having to check options and panic, or throw errors.
  2. not having to duplicate code, specifically just for f64 type (and/or for f32 type), and then demand that the users must explicitly chose it.

Ad point 1, in reply to HJWT, I noticed significant speedup in my benchmark tests of sort_unstable_by(), when I pass to it total_cmp, as opposed to partial_cmp. Another reason to prefer it.

PS. I also use custom comparison closures defined by users. When they are carelessly comparing some complex structs of their own, again, their comparison might fail on some instances. Probably better to demand that they provide a comparison that does not fail, i.e. Ord, rather than PartialOrd. Same with sorting.

Don't you have to treat NaNs specially for that anyway? Naïvely, I would expect one of these behaviors:

  • NaNs are ignored, so that the median of [0.0, NaN, NaN, NaN] is 0.0
  • Any NaN in the sequence produces a median of NaN, because it's impossible to tell where in the sequence it should appear.

Treating NaN as either the first or last element (or any other particular place in the order) doesn't really make sense to me for this application.

5 Likes

It kind of makes sense to me, under the order defined by total_cmp(), which is pretty thorough. Though, granted, when there are significant relative numbers of NaNs, the result will not be very meaningful. However, at least it will not be an error.

I disagree. If I'm looking for the median of a calculation that involves taking square roots, and some of them were taken from negative numbers, any result other than NaN is erroneous. Worse, it's an error that looks like it isn't one which can be quite difficult to track down.

9 Likes

Well, yes but that is generally the problem with NaNs. I guess it would have been better if floats were defined as throwing errors rather than NaNs. Because, like this, they are already in your data and you are stuck with them. I am just trying to make the best of a bad job.

In this case, one person's problem is another's feature. When you're doing complicated or batched calculations, error detecting code at every step can interfere with clarity. It's often much clearer to plow ahead and have a single check for whether or not the intermediate values left the representable domain after the calculation is done. And the nonuniform control flow for intermediate checks can wreak havoc on various optimizations, such as processor pipelining.

2 Likes

Imagine you are collecting data samples that must be always of the same length, say 1000. If just one of the numbers turns out to be a NaN what do you do?

Also, partial_cmp() is exactly doing the individual checking that you are arguing against and that is why it is slow.

I don't know; it would depend on details that your hypothetical doesn't specify like where the samples came from, why they need to be exactly 1000 items in length, and what consequences might arise from a faulty result. I might:

  • Relax the length requirement somehow and throw out the individual NaN
  • Throw out the entire batch of 1000 that contains the NaN
  • Report the sensor as faulty and disable it until maintenance personnel investigate the root cause
2 Likes

It is quite common in statistics to demand samples of the same size. When you discover only subsequently that some items are invalid, then all your suggestions will essentially invalidate your whole process.

If your proposed analysis requires everything to be the same size and some of the items turn out to be invalid, the whole analysis has already been invalidated by the faulty data— Trying to hide that fact is just asking for trouble.

12 Likes

Remember we are now discussing the hypothetical case where I suggested that it would have been better if NaNs were rather reported immediately as errors, that means at the data collection stage. You were arguing for 'plowing ahead' and now you switched sides. Good.