Hi,
I am writing a function that needs the kth largest element in Vec<f32>. I am aware of the algorithm that gives time complexity O(nlogk) where n is the size of array, see e..g https://users.rust-lang.org/t/solved-best-way-to-find-largest-three-values-in-unsorted-slice/34754, using heap. But, the ones I saw were applicable for i32 type (essentially since Binary Heap requires Ord and float does not implement Ord trait).
I can think of 2 strategies:
Implement binary heap part myself (compatible for f32)
I dont want to go with 2 since there is a fundamental array type in my project that I dont want to depend on external crates.
Do you have any way that would not require custom implementation and external crate while having optimal worst case complexity ( for instance I dont want quick select scheme)?
Any suggestion is appreciated
You can implement the relevant part of ordered_float yourself very easily, it's just a wrapper around the float types that implements Ord, along with a bunch of conveniences for accessing the inner value.
Not that theoretical log-factors really matter in practice though, constant overhead factors can be just as bad or even worse, so whether going with an algorithm that can actually promise O(n) worst case is worth it in practice might be a trade-off and should e. g. also be benchmarked.
You can radix-sort floats using the integer key x xor ((x asr 31) lsr 1) (here x is the underlying bits of the float), asr is arithmetic right shift and lsr is logical shift right.
Quick selection method based on pivoting has an expected complexity of O(n), the worst case is O(n²)
It also describes it this way in the wikipedia page, in pivoting section.
Am I missing something?
Thanks, I will check against my data distribution how each algo behaves.
Quickselect is a concrete algorithm, selecting the kth element is a algorithmic problem. While quickselect has a bad worst case time, the selection problem in general can be solved in worst case O(n), too, using different algorithms. E. g. if you click on the "median of medians" algorithm, that's one O(n) worst case time algorithm that I'm aware of and that was covered in some uni lecture I once attended. In practice, of course mind the constant factors, too, I don't remember how good or bad that one in particular faired w. r. t. the constant factor overhead.
Radix sorting is O(n) for some suitable model of computing (i.e., an unrealistic one) and is very efficient in practice. But this really depends on what k is. For instance, if you just need the second largest element of your array you can do something really fast with vector registers, but if you need the median, you need to do something more complicated.