Why `sort()` need `T` to be `Ord`?

jbe · January 6, 2022, 11:34am

Posting here instead of the other thread I forked, because this is related to the original question: "Why sort() need T to be Ord?"

@kornel, how about something like:

trait PartialSort {
    fn partial_sort(&mut self);
}

impl<T: std::cmp::PartialOrd> PartialSort for [T] {
    fn partial_sort(&mut self) {
        todo!()
    }
}

(Playground)

This could be easily implemented in a crate, right?

kornel · January 6, 2022, 11:47am

That could work.

Rust also has a "hack" of implementing regular inherent min/max methods directly on floats, which syntactically mimic what Ord has, so number.min(number) works for both floats and integers.

How about doing the same hack for sort functions?

impl [f32] { // is that a real syntax? :)
  fn sort_unstable(&mut self) {…}
}

trentj · January 6, 2022, 12:12pm

"Partial sort" is unfortunately the name of a completely different algorithm that is not directly related to partial ordering. So the name is a no-go

jbe · January 6, 2022, 1:05pm

We could name it .sloppy_sort ^^

scottmcm · January 6, 2022, 6:58pm

But note that number.minimum(other) was also just added (in nightly), for the other "minimum" defined in IEEE 754.

I'm not really a fan, because you end up with the whole "well, where do the NANs go?" problem when defining said function. For example, do they go at the beginning or the end? And does the NAN payload matter?

I think having people do .sort_by(f32::total_cmp) would be fine (once we eventually stabilize that, or something like it).

And note that it puts NANs at both ends, depending on their sign bits.

quaternic · January 6, 2022, 9:50pm

So, what does it mean to sort something by PartialOrd? It seems like a reasonable expectation that for any two indices i < j, it holds that v[i].partial_cmp(v[j]) != Some(Greater). For the general (worst) case, consider sorting a haystack with two needles and a million pieces of hay in it:

NeedleA < NeedleB
PieceOfHay(k) not comparable to anything but itself
Everything equal to itself

The needles have to end up correctly ordered, but the only way to get any useful information (including recognizing if it's a needle or not) is to compare the two needles to each other. Therefore we have to make O(n²) comparisons!

What might happen in reality is that someone tries to sort a large array that accidentally ended up containing only NaNs. If the algorithm is to be correct and generic for T: PartialOrd, it may effectively hang the program at that point. Of course for actually sorting floats, f32::total_cmp (or anything that can e.g. consider all NaNs equal) will have no issues.

jbe · January 6, 2022, 9:54pm

~~Couldn't the algorithm just arbitrarily sort PieceOfHay before any other value when applying quicksort to everything (which is O(n * log(n)), if I'm not mistaken?~~

Ah, PieceOfHay is comparable to itself. That might make things more difficult.

quaternic · January 6, 2022, 10:12pm

The point is that in a T: PartialOrd generic function the only thing you can do with a T is to try to compare it to another T. I suppose it could try something like examining the bit patterns, and if they match you know that the two things are interchangeable. But even that won't help if they never do match. It could equivalently be the integers 0..n with two randomly chosen as the needles. The algorithm can't know which ones they are until it finds both at the same time and compares them.

Floats are saved by the property that the incomparable elements (NaNs) are not comparable to anything, but that's not required by PartialOrd.

scottmcm · January 6, 2022, 10:14pm

Note that when I hear "partial sort" I think std::partial_sort - cppreference.com, which is different from "allows partial ord".

steffahn · January 6, 2022, 10:24pm

I looked up this kind of question earlier, this page contains interesting links

the first answer cites a paper that contains algorithms that are more efficient depending on a parameter "w" which is (AFAIR) something like the largest size of a set of mutually uncomparable items.

On the other hand, if very few items are comparable at all (and you somehow know beforehand which are) then topological sort might be efficient; otherwise a O(n²) time might be correct for the best possible approach, in a fully general setting.

quaternic · January 6, 2022, 10:47pm

It most certainly is, since if all comparisons you've done so far were not comparable and there's any pair you haven't compared directly, that pair could go either way.

Thanks, I clarified my wording.

quinedot · January 6, 2022, 11:53pm

Topological sort was my first thought. A poset is a DAG. (And a DAG can be associated with a poset.)

system · April 6, 2022, 11:53pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Why do all sorted Data Structures only allows types that implement `Ord` help	6	1269	May 6, 2022
Total order for floats? help	123	3872	December 30, 2023
Performance of sorting floating point numbers?	11	3199	January 12, 2023
How to sort a Vec of floats? help	9	37713	December 16, 2020
Is `None` returned from `partial_cmp` supposed to only signify `NaN`? help	10	1414	September 14, 2022

Why `sort()` need `T` to be `Ord`?

Related Topics