I have a program which has a struct storing a shared reference to [T], and I need to implement a method on it which can return the top three values. The slice is unsorted, and I need to keep it that way for other parts of the implementation. Is my solution a good idea, or can it be improved?
Here is the struct definition:
struct HighScores<'a> {
scores: &'a [u32],
}
and here is a partial implementation with the method in question:
impl<'a> HighScores<'a> {
pub fn personal_top_three(&self) -> Vec<u32> {
let mut result;
if self.scores.len() <= 3 {
result = self.scores.to_vec();
result.sort_unstable_by(|a, b| b.cmp(a));
} else {
result = vec![0; 3];
for &score in self.scores.iter() {
if score > result[0] {
result[2] = result[1];
result[1] = result[0];
result[0] = score;
} else if score > result[1] {
result[2] = result[1];
result[1] = score;
} else if score > result[2] {
result[2] = score;
}
}
}
result
}
}
Just have an output array of three elements. That will hold your three largest values. It will be kept in order as you find them in the input array.
Iterate through your input array once and compare it's elements to the lowest value in your output array.
If you find a value higher than your current lowest it's time to put that value into your output array in the right place, perhaps moving existing output values down (and out) as required.
That seems fine. You could also move stuff around less by keeping your top 3 unsorted, and swapping out the lowest value each time.
Something like
/// Return the index of the smallest value
fn get_index_of_lowest(values: &[u32]) -> usize {
let mut lowest = u32::MAX_VALUE;
let mut index = 0;
for i in 0..values.len() {
if values[i] < lowest {
lowest = values[i];
index = i;
}
}
index
}
fn top_three(scores: &[u32]) -> {
let mut result == vec![u32::MIN_VALUE; 3];
for score in scores {
let index = get_lowest_index(result);
if score > result[index] {
result[index] = score;
}
}
result.sort_unstable();
result
}
It still accesses all three elements on every loop, but only has to write to one of the locations. (My implementation skips the len <= 3 case, for clarity, but you should keep it). I haven't benchmarked this, so I'm not actually claiming it's faster, but it's another option that might be worth a try.
The standard library has a BinaryHeap which may be useful:
pub fn personal_top_three(&self) -> Vec<u32> {
use std::collections::BinaryHeap;
let mut heap = self.scores.iter().copied().collect::<BinaryHeap<u32>>();
let mut top_three = Vec::new();
for _ in 0..3 {
if let Some(v) = heap.pop() {
top_three.push(v);
}
}
top_three
}
Another alternative, depending on how much code you want to write, how much memory usage is a concern, and how the benchmarking turns out, is just stuff everything into a BTreeSet, which is ordered by default, and take the largest three.
fn select_k(it: impl Iterator<Item=u32>, k: usize) -> Vec<u32> {
let mut h = BinaryHeap::new();
for item in iter {
h.push(Reverse(item));
if h.len() > k {
h.pop();
}
}
h.into_iter().map(|rev| rev.0)
}
Note, that this code limits heap size to k+1. It allows to achieve linear performance (solutions proposed by cliff and tuffy are O(N log N))
I'll see your binary heap, and raise you non-branching code:
fn select_k(it: impl Iterator<Item=u32>, k: usize) -> Vec<u32> {
let mut h = BinaryHeap::new();
let mut first = iter.take(k);
for item in first {
h.push(Reverse(item));
}
for item in iter {
h.push(Reverse(item));
h.pop();
}
h.into_iter().map(|rev| rev.0).collect()
}
All good suggestions, but remember that BinaryHeap's iter() "returns an iterator visiting all values in the underlying vector, in arbitrary order". So best be careful about that last accumulation step.
Not sure if my .take() version does what I thought. take seems to consume its contained iterator. I thought there was a way to iterate over the taken values, and then keep iterating on the inner iterator, but I can't make that work now.
Thanks for all the suggestions! I might not have made it clear in the OP, but the code should return a Vec shorter than 3 if the scores slice contains less than 3 elements. Also, a heap (data structure) seems a bit overkill to me. I believe it is possible to get the largest three values by only iterating once over the slice, without any allocation of more data structures on the heap (memory region). However, I do like the idea of only sorting at the very end.
Be aware that push and pop from BinaryHeap are not implemented in a non-branching way.
Another solution would be to do this:
pub fn top_three(slice: &[u32]) -> Vec<u32> {
let mut result = Vec::with_capacity(4);
for v in slice.iter().copied() {
result.push(v);
let mut i = result.len() - 1;
while i > 0 {
if result[i] <= result[i-1] { break; }
// swap
let t = result[i];
result[i] = result[i-1];
result[i-1] = t;
i -= 1;
}
// adjust this to change number of items
if result.len() > 3 {
result.pop();
}
}
result
}
But I do want to iter, enumerate and skip... The point is, I was trying to get rid of the indexing on the vec, since that could be a source of unnecessary bounds checking. I wanted to skip the first element, since I already initialized smallest with it. Maybe it is too functional, but I don't believe so. I think it expresses my intent better than the alternative, with iteration over the indices and indexing into the array.
@alice thanks for the comment. I think you're right -- it's probably also the best expression of my intent. Although I'd be happy with both versions.
If you're curious why some posts of yours are getting "flagged by the community", it might go something like this:
It feels to me like you're framing personal preference as objectively superior and dismissing others' preferences as obscure and useless. This campaign against styles of programming that you personally do not prefer doesn't need to frame things in such absolute terms. (I also question the need for a campaign at all, but that's a different topic!)
Rust is awesome and compelling in many ways and attracts people from all sorts of programming backgrounds and skill levels. Having a little empathy for that can go a long way, especially on an online forum.
There is a linear-time solution for general k (find top-k elements unsorted) using one of the fast "selection algorithms" (nth_element in C++, not std in Rust). The Wikipedia page contains some more information.
However, in practice, and for such small k (i.e. = 3), probably the fastest single-threaded way is using some SIMD instructions (I'm not sure, just an educated guess). Anyway, I guess OP doesn't require "the fastest way" (and there is already an accepted solution). This comment was just for your information - the fact that there are in fact algorithms asymptotically faster than using BinaryHeap.