This is now significantly more expensive. Consider T=String
: now you're cloning k heap allocated strings to avoid sorting the n-k strings. Since you're taking self
by value, you don't even want a T: Clone
bound ideally. You probably want something along the lines of
fn smallest_k(self, k: usize) -> Vec<T>
where T: Ord,
{
assert!(k > 0);
let mut iter = self.into_iter();
let mut heap: BinaryHeap<T> = iter.by_ref().take(k).collect();
for item in iter {
let mut root = heap.peek_mut().unwrap();
if item < *root {
*root = item;
}
}
heap.into()
}
Yes, this will move each T
out of self
in turn. Unless you write a specific partial heapify, though, this is as good as you're going to get.
Rust's design is fairly predicated on the idea that moving values around is cheap. Avoiding doing this along with the binary heap allocation is an interesting exercise but extremely unsafe
and prone to leaking on unwind.
I got nerd sniped
Very roughly, the solution could look like
fn smallest_k<T: Ord>(mut v: Vec<T>, k: usize) -> Vec<T> {
assert!(k <= v.len());
if k == 0 { v.clear(); return v; }
let n = v.len();
v[..k].sort_unstable();
unsafe {
v.set_len(k);
for i in 0..n - k {
let mut v = guard_on_unwind(&mut v, |v| {
v.spare_capacity_mut()[i + 1..n - k]
.slice_assume_init_drop();
});
let (head, spare) = v.split_at_spare_mut();
let next = guard(&mut spare[i], MaybeUninit::assume_init_drop);
if head[k - 1] < next.assume_init_ref() {
continue;
}
let p = head.partition_point(next.assume_init_ref());
let next = ScopeGuard::into_inner(next).assume_init();
v.insert(p, next);
v.truncate(k); // equiv. v.pop(), but in place, since that matters
}
}
v
}
This is COMPLETELY UNTESTED and gives NO PROMISE OF BEING SOUND. It just exists to illustrate the theoretical way to do this without an extra allocation and the significant amount of unsafe required in order to manage the concurrent drain and modification of the vector.
Bonus: I hit post and immediately spotted a place where an unwind can happen and leak the n-k elements. I wrote this on my phone and have zero confidence in its correctness.
And I'd not be surprised if this optimizes/performs worse than the simple and safe version above.
If you prefer the cloning version, then return the references directly instead, e.g.
fn smallest_k(&self, k: usize) -> Vec<&T>
where T: Ord,
{
assert!(k > 0);
let mut iter = self.iter();
let mut heap: BinaryHeap<&T> = iter.by_ref().take(k).collect();
for item in iter {
let mut root = heap.peek_mut().unwrap();
if item < *root {
*root = item;
}
}
heap.into()
}
and the caller can do the .map(Clone::clone)
.