I'm looking for an efficient means of selecting the best N items from a stream of data where 'N' and 'best' are user-provided. I can find all sorts of things that get me close but not quite there.
For example binary-heap-plus caters for a user-provided definition of 'best' but no good way of dealing with 'N', while min-max-heap provides the means of dealing with 'N' but no convenient means of specifying 'best'.
Can you suggest something ready-made, or will I have to roll my own?
Wrapping in a newtype and implementing a trait[1] is a lot less convenient than simply providing a key or comparison function. Which is probably a large part of the motivation for the existence of binary-heap-plus.
which in the case of Ord is 4 traits: Ord, PartialOrd, PartialEq and Eq, though this could probably be mitigated with judicious ordering within the newtype and #[derive(..)]↩︎
Alternately, you can use binary-heap-plus set up to return the worst item it contains, and then never let it grow more than N items. If it would, use peek to determine if the next item should replace it in the top-N group.
At the end, popping everything off the heap will give you all of the top N items, ordered from worst to best.
You can just create a new min heap and populate it with the first N elements, then use peek_mut to selectively replace the worse element with the new one if that's better. Then repeat this for all elements.
What do you mean? To me it seems it provides almost the same capabilities as binary-heap-plus. The only additional one is that you can get both the minimum and the maximum, but you only need one of them (the "worse" according to your "best" ordering predicate)
min-max-heap provides a means of limiting the number of items stored in the queue, by allowing inspection of the size and removal of the worst element ... while maintaining a priority queue of the best items seen so far.
Limiting the size of binary-heap-plus would require having the queue backwards, so extracting the best element requires reversing the items in the queue. This is fine if you only ever want to use the queued items once all the items have been fed into the queue. If you want to use the queued items before all items have fed into the queue, then min-max-heap seems more appropriate than binary-heap-plus.
Given that my original problem question only mentioned selecting the best N items, both approaches satisfy the stated requirements.