Cannot sort floats

I am in no way an expert on this matter, so please be gentle with the following! :wink:

I've created this issue earlier: Cannot sort f64 · Issue #67417 · rust-lang/rust · GitHub
Complaining that the following does not work.

fn main() {
    let fs = vec![11.0, 7.0, 5.0, 3.14, 3.0, 2.7182, 2.0, 1.4142, 9.8, 100.0, 10.0];
    fs.sort();
}

but was directed here for "discussion". Now my opinion is that this should definitely work. My reasons are:

  1. The typical solution of using sort_by does not compose well. For example, slice - Rust cannot be used with a float output.
  2. The argument that Inf and NaN cannot be compared is a straw man. We cannot take a square root of negative numbers either and yet that exists. Other examples are easy to find.
  3. Programming languages which cannot sort numbers are ridiculous. This sounds silly but actually seems accurate to me.

So, if somebody could explain to me why this behavior is very intentional (to quote a comment on the issue) I would be grateful.

1 Like

Nitpick: Inf and -Inf can be compared to other floating-point numbers. They compare greater than and less than, respectively, any other non-NaN. It's only NaNs that cannot be compared, because they are not numbers.

1 Like

@trentj I don't have the link anymore but somebody commented in Discord yesterday that IEEE754 actually defines a sorting that includes NaNs.

I mean, the reason sort refuses to sort it is that floats do not implement Ord, and it is correct to not implement that trait for floats because they are not totally ordered.

And it's not like you actually can't sort floats. You can do it like this:

fn main() {
    let mut fs = vec![11.0, 7.0, 5.0, 3.14, 3.0, 2.7182, 2.0, 1.4142, 9.8, 100.0, 10.0];
    fs.sort_by(|a, b| a.partial_cmp(b).unwrap());
    println!("{:?}", fs);
}

playground

In this case the code will panic if the vector contains NaN.

2 Likes

It defines several, which is the problem.

1 Like

Well, of course you can sort them. But to quote my example from the issue:
For example, if I have a vector of Person structs that I want to sort by body-mass-index. The imho natural approach would be

people.sort_by_key(|person| person.height / person.weight.pow(2.0));

with sort_by this becomes

people.sort_by(|a, b| (a.height / a.weigh.pow(2.0)).partial_cmp(b.height / b.weight.pow(2.0))).unwrap()

or something like this.
Now you could argue that in this particular case I should just give my key function a name, of course. But if sort_by_key is considered useful I think this is a general issue. Hence my mention of sort_by_cached_key above.

You can also use the ordered_float crate:

use ordered_float::OrderedFloat;
people.sort_by_key(
    |person| OrderedFloat(person.height / person.weight.pow(2.0))
);
1 Like

I don't think that's the problem. Nobody cares how NaNs are sorted because this more or less does not make sense. The question is what should happen here. The usual options are:

  1. Silent failure. Garbage-in -> Garbage-out. We output a reasonable sorting with the NaNs somewhere defined but meaningless. I'm not a fan because I like to find my errors early.
  2. Loud failure. What the sort_by solution does. I want this but I don't want to code it manually. Vec::sort() should just return a Result or something and be able to fail.
  3. Prevent people from trying. The current situation. It does not mean that people don't do it. It just means the need to come up with a - probably bad - solution themselves.

I really don't think several options for 1. are the issue here.

1 Like

The argument that Inf and NaN cannot be compared is a straw man

No it's not :wink: But I think this is just a different viewing position.

We cannot take a square root of negative numbers either and yet that exists

[src/main.rs:2] (-1.0_f32).sqrt() = NaN

As Alice (and I) already pointed out you can use the ordered_float crate which does the unwrapping for you in the background. No need to do it yourself.

Nobody cares how NaNs are sorted because this more or less does not make sense

Well, just because you don't care doesn't mean everybody doesn't care. NaN typically means you fucked up a computation and it should not be that value.

Garbage-in -> Garbage-out

Not how Rust works.

Loud failure

No need for that because we have the requirement, that the input must have a total ordering.

Prevent people from trying

This is the way to go! :point_up_2:


You can compare this topic to the Option and Result types. Of course you could just silently ignore errors and proceed like nothing happened, but that's not what Rust wants to do.
It wants you to ensure that what you write don't explode at runtime, but rather at compile time. That is an important feature.

1 Like

Hi Marcel! Thanks for coming over!

There is a time and place for every strategy, I think. :thinking:

Garbage-in -> Garbage-out is not how Rust works

That is what happens for square root, though:

[src/main.rs:2] (-1.0_f32).sqrt() = NaN

And of course in practice it happens for lots of algorithms. The type checker cannot check most things.

Nobody cares how NaNs are sorted because this more or less does not make sense
Well, just because you don't care doesn't mean everybody doesn't care. NaN typically means you fucked up a computation and it should not be that value.

I agree, that's why I am not a fan of the first option. But still, they can be sorted and in a well defined way. So it is an option.

Loud failure: No need for that because we have the requirement, that the input must have a total ordering.

Imho, this is what all functions that return a Result are doing. This is perfectly appropriate when the type checker cannot distinguish between cases that should be allowed and ones that should not. It moves the problem solution to execution time.

You can compare this topic to the Option and Result types. Of course you could just silently ignore errors and proceed like nothing happened, but that's not what Rust wants to do.
It wants you to ensure that what you write don't explode at runtime, but rather at compile time. That is an important feature.

Yeah, so as described above I think your comparison is inaccurate. Regarding the last point: If the compiler could ensure that I would be totally happy. Unfortunately, the way floats work on modern hardware, there are NaNs and so the compiler cannot but preventing me from trying here is throwing out the baby with the bathwater.

As Alice (and I) already pointed out you can use the ordered_float crate which does the unwrap ping for you in the background. No need to do it yourself.

This is indeed a good solution. Doesn't really explain why this is not the default behavior though.

The reason we require T: Ord when sorting things, is that this means that the compiler can prove that the sorting will succeed without panics or errors. If we allow types that are only PartialOrd, we lose the ability to prove that it will always succeed, and this kind of proof is a very nice property.

I suppose you might want a sort_fallible that accepts a T: PartialOrd and returns a result?

2 Likes

At least that is something that seems like a sensible solution to me. Yes. :smiley:

Especially as long as something as fundamental as numbers are considered not ordered.

I mean, floats fundamentally are not ordered.

2 Likes

Well, floats are not. But floats are used to represent numbers and numbers are. Hence, all the confusion about the weird edge cases of floats. :slight_smile:

Your wording "numbers" is something that a human may refer to, but a compuer does not.

A (x86_64) CPU knows integers (of various bit-widths (8, 16, 32, 64, ...)) and floating values defined by IEEE754.

A human may call both of them numbers, for a CPU they are completly different, e.g:

  • 1 as u32 is 0x00000001
  • 1 as f32 is 0x3F800000

Two completly different representations for the same number right?

So, you have to distinguish between integers and floating values. They aren't the same. Not at all.

3 Likes

If you are doing a project where you need to sort a lot of floats, you may be interested in an extension trait:

trait SortExt {
    fn sort_fallible(&mut self) -> Result<(), CompareFail>;
}

impl<T: PartialOrd> SortExt for Vec<T> {
    fn sort_fallible(&mut self) -> Result<(), CompareFail> {
        let mut failed = false;
        self.sort_by(|a, b| {
            T::partial_cmp(a, b).unwrap_or_else(|| {
                failed = true;
                Ordering::Less
            })
        });
        if failed {
            Err(CompareFail)
        } else {
            Ok(())
        }
    }
}

playground

As long as this trait is in scope, it will allow you to call sort_fallible on vectors of partially comparable types.

1 Like

Well, of course but in the end a programming language is a tool for a human to tell a computer what to do. So I think considering the human perspective is not unreasonable.

1 Like

And the type checker is a tool for the computer to tell the human where they might have fucked up.

5 Likes

Thanks! That's really cool!

However, it neither answers my question why this is not the default behavior nor helps with things like sort_by_cached_key.

But in this case the type checker is telling me that I have fucked up when I have not. And that's the problem.

1 Like

The extension trait can be further extended to supply a sort_by_cached_key_fallible. Alternatively use the ordered_float package from before, which allows you to return something you can use as a return value in sort_by_cached_key.

As for why this is not default behaviour, if you have not yet been sufficiently convinced, I may not be the right person to ask.

1 Like