Too severe precondition for slice::sort_by?

I have the strong impression that the function sort_by is not as useful as it seems, due
to the restriction on the compare parameter. The documentation of the function says:

May panic if compare does not implement a total order

I would like to have a guarantee that the following program will not panic.

use std::cmp::Ordering;

#[derive(Debug,PartialEq, Eq)]
struct Person {
    id: usize,
    age: i8,
}

impl Person {
    fn new(id: usize, age: i8) -> Person {
        Person{id, age}
    }
    fn cmp_by_age(p1: &Person, p2: &Person) -> Ordering {
        p1.age.cmp(&p2.age)
    }
}

fn main() {
    let mut persons=vec![Person::new(1,97), Person::new(2,15), Person::new(3,97)];
    persons.sort_by(Person::cmp_by_age);
    println!("persons sorted by age: {:?}", persons);
    assert!(persons[1] != persons[2]);
    assert_eq!(Person::cmp_by_age(&persons[1], &persons[2]), Ordering::Equal);
}

If I run this program, the output is as hoped for:

persons sorted by age: [Person { id: 2, age: 15 }, Person { id: 1, age: 97 }, Person { id: 3, age: 97 }]

But, could this program panic just as well? It looks like the answer is yes.
The documentation of the function states :”For more information and examples see the Ord documentation.” Now, Ord is a trait of a type, not of some function or closure. So, we need to interpret.

The documentation states that the implementation of Ord must be consistent with PartialOrd. In particular:

partial_cmp(a, b) == Some(cmp(a, b))

The documentation of PartialOrd states that, amongst others, the following condition must hold:

a == b if and only if partial_cmp(a, b) == Some(Equal)

Filling in the consistency requirement, we get:

a == b if and only if Some(cmp(a, b)) == Some(Equal)

If we remove the Some on both sides:

a == b if and only if cmp(a, b) == Equal

If we fill in cmp_by_age for cmp, this condition is not met, as we can see from the two assertions in the code.

My questions:

  • Is my conclusion correct that I cannot use sort_by confidently in cases as above?
  • If so, why doesn’t the Rust standard library sort the way C++ does? That is, require a weak ordering instead of a total ordering.

Your cmp_by_age does implement a total order, so you have no problem.

The requirement for PartialOrd to be consistent with PartialEq is irrelevant here because you are not implementing PartialOrd.

2 Likes

plus also because sort_by does not have PartialEq bound so it cannot depend on its implementation details, I think.

A partial order is one where not all elements can be sensibly compared with each other. Similarly for partial equality. This is in contrast to a total order or total equality.

For a concrete example consider floating point numbers. These form a partial order and partial equality because NaN (not a number) results (which you can as errors on for example 0/0) do not compare equal to themselves.

That's right: not only are there many different NaNs, but the same NaN is also not equal to itself. And there is no defined order in between NaNs (let alone NaNs and normal values).

Floating point numbers are weird.[1]

In practise, floats are where this will commonly come up. There are probably some other cases, but I don't think I have ever ran into this outside of floats. Mostly things either have an order or they don't have an order at all.

Yoir case is fine becuse you are only comparing by a consistent part that is a total ordering. As far as the sort is concerned, there may be some duplicates in the list (that aren't duplicates when considering the whole). But the sort doesn't see that Eq.


  1. In many many more ways than just that. For example both +0 and -0 exist, and are different values. ↩︎

I think this is where the confusion comes from. It is only the description of total order and perhaps the examples in that doc that matter. I wonder whether it should be referenced at all here.

1 Like

It does seem like the parts that apply to both Ord and to (returning) Ordering could be placed in the cmp module docs instead.

3 Likes

The definition of a total order requires that a <= b && b <= a implies a == b. So in that sense, the example is indeed not a total order. Arguably, the docs are using the term in a somewhat sloppy way.

1 Like

Total order here just means that for every pairing, the comparator must return exactly one of Less, Equal, or Greater, which is guaranteed by the signature, and that the relevant (ir)reflexivity, transitivity, and (anti)symmetry properties must hold. The "equality" here means some arbitrary equivalence relation that can be entirely unrelated to the canonical ordering imposed by Ord and Eq. That should, at the very least, be clarified.

But honestly Rust should just bite the bullet and, like C++ does, admit that what’s required is only a weak order rather than a total one. Even if no WeakOrd marker trait is added, and Ord still requires a total order, anything taking a custom comparator could be relaxed to specify that returning Ordering::Equal only means equivalent, not equal in sensu stricto, and that my_cmp(a, b) == Ordering::Equal certainly does not imply that a == b.

3 Likes

Ironically, the standard ordering of floats isn’t even a partial order – or a preorder – even the weakest (edit: non-strict) relations that math still calls orderings at the very least have to be reflexive! As such, Rust’s PartialOrd does not require a partial ordering either, in the standard math sense of the term anyway.

(Edit: @philomathic_life correctly pointed out that it is a strict partial order, that is, a partial order given by a "less-than" rather than a "less-than-or-equal" relation, requiring irreflexivity instead of reflexivity).

2 Likes

This is the answer I hoped for.
See the C++ concept and the Wikipedia definition for more background.

1 Like

Not sure I understand that claim. There are several concepts in math called orderings, and some of them are not reflexive. For example < is often called a (strict weak partial) ordering, and it's not only not reflexive but more strongly irreflexive. In that sense, floating point numbers under < do define a strict weak partial ordering.

For sorting of floats there is total_cmp, I would say.

No, total ordering is required here, because we have a cmp with enum Ordering result.

The reason for C++ to only require “strict weak ordering” is funny: what C++ actually needs is, of course, total order. Most sorting algorithms require a total order, after all.

But, then, why the heck C++ is proudly talks about “strict weak ordering”? It easy: because it can!

What is this funny sounding strict weak ordering, after all? Read the definition, if you want, but the short form is: strict weak ordering is subset of weak orderings where fake-comparison ≲ defined from < (a ≈ b when neither a < b nor b < a)… drumroll… gives us a total order.

Sorting algorithms, in C++, only ever use <, they don't have any way to compare elements for equality. That means that they, internally, create ≈ (using the aforementioned rule: a ≈ b when neither a < b nor b < a) and then use not < and = for sorting, but ≲, fake-comparison, that is: total order. Precisely what most sorting algorithms need.

Yes, and now, when we know why C++ and Rust require the exact same thing, but one of them may describe it as “strict weak ordering” and the other have to describe as “total ordering” we may start thinking about how to better change the documentation.

But no, we couldn't say that we need “weak ordering”, that would simply be a lie… there are three-way operator involved you couldn't play games that C++ plays.

Yet, unironically, ≲ created from from < is a total weak ordering which means that you can turn it into a total ordering… which is precisely what both C++ and Rust sotring algorithms require.

1 Like

Oh, I meant in the sense that non-strict preorders are reflexive, or in Rust terms, PartialOrd::partial_cmp(a, a) == Ordering::Equal which floats don’t, of course, fulfil.

Float ordering is not a SWO because everything is incomparable with NaNs, so the incomparability relation is not an equivalence relation. Proof by counterexample: let

a # b = !(a < b || b < a).

Then if < is a SWO, # must be an equivalence relation. But

1 # NaN && 2 # NaN, but !(1 # 2),

violating transitivity.

If floats were a SWO, everything would be much easier and this dance with pseudo-partial orders wouldn’t be needed!

1 Like

But that answer would be incorrect! Both C++ and Rust sorting algorithms require the exact same thing: total ordering

But C++ receives < operation, turns it into ≲ internally, then uses that total order.

While Rust received three-way comparison oprtation and that means it's your resonsibility to turn your “strict weak partial ordering“ into a total ordering, if you want to use it!

That is the thing: C++ and Rust actual requirements are the same, there are no difference, but their API is different, that's why their stated description of requirements is different!

But float ordering is as strict weak ordering if you simply ignore == and define a ≈ b when neither a < b nor b < a.

That is what C++ is using (C++ sort API only receives < operation, == is ignored there) and that's what you have to do in Rust, too.

And, after that discussion, it becomes apparent that this critical explanation needs to be added somewhere because we spent a lot of time discussing nuances of text without anyone ever trying to look on these APIs to understand why the stated requirements of C++ and Rust and different and people argue that requirements of C++ are the same (which is true) and thus text of documentation should mention “strict weak ordering” in both cases (which is not true, Rust API requires total ordering… just like C++ one does, too… internally).

See my counterexample. It doesn’t use = anywhere, only <. C++ requires a SWO from its comparators and operator< when used with std algorithms or ordered data structures.

Now that there’s the spaceship operator, float <=> float returns a partial_ordering instead of strict_weak_ordering.

But all the existing stuff in std still requires operator<… which cannot be statically enforced to be a SWO.
So sorting floats or putting them into std::map or whatever is still ill-defined, at the very least if NaNs are involved. That’s C++ for you… and exactly the reason that Rust has PartialOrd.

Damn. Here you are correct, but that just means that one couldn't use < with floats and have to use total_cmp…

That's the final answer in that sad saga… now we only need to think about how and where to add that to documentation. Because all the asnwers are logical and are, kinda, in the documentation… but they are scattered and thus misconceptions happen.

Yes, C++ puts the demands on your and doesn't verify them… but these are the exact same demands Rust have.

Rust at least is still in a better place than C++. Even though the latter now has <=> which is, as far as I can see, specified correctly and rigorously and can statically differentiate between total, weak, and partial orders (like Rust can between total and partial), and nicely helps make sure that < > <= => == != are all consistent with each other, it's still UB (rather than a compile error) to sort floats (technically, even if no NaNs are involved!) without providing a custom comparator, because all the existing stuff is still specified in terms of operator< and not operator<=>.

But I really wonder how many C++ programmers actually realize that. Unlike many other cases of UB or unspecified-ness, this one doesn't seem to be discussed much compared to how common it must be in practice.

I suspect it's because NaNs are very rare, in practice. And if you sort floats without NaNs then everything works.

Most C++ programmers don't even understand what UB is and what are the implications of triggering that!

“Vibe coding” is all the rage, these days, but believe me, most C++ developers (as well as all other developers) practiced it before AI have risen to the prominence. They just worked as AI, themselves: connected random pieces of code from StackOverflow and hoped for the best.

Why do you think Google wants so much to replace C++ with something? Vibe coding kinda-sorta works with Python, JavaScript, even Rust without unsafe… yet it doesn't work with C++… but people practice it, anyway!