Surprising behavior of BinaryHeap::into_iter()

tldr

TODO

  • Add .into_iter_sorted() method to BinaryHeap

original post

    use std::collections::*;

    let heap = BinaryHeap::from(vec![1, 8, 5, 6, 3, 4, 0, 9, 7, 2]); // max heap from a vec
    assert_eq!(
        heap.into_iter().collect::<Vec<_>>(),
        vec![9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
    );

produces the following error:

   Compiling playground v0.0.1 (file:///playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.97 secs
     Running `target/debug/playground`
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `[9, 8, 5, 7, 3, 4, 0, 6, 1, 2]`,
 right: `[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]`', src/main.rs:7:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

For me this is surprising but the lib reference says it is correct.
[Edit]
I use BinaryHeap inorder to get the values in max-first order.
It is ridiculous to return values in arbitrary order.
[/Edit]

It's just a thin wrapper of the underlying vector's iterator.
How can I get the values of max heap in the max-first order using iterator?

[Edit]
My opinion is that Iterator for BinaryHeap should abstract the following use case (sorry, C++):

    while(!q.empty()) {
        std::cout << q.top() << " ";
        q.pop();
    }
    std::cout << '\n';

[/Edit]

Why is this surprising to you?

To convert it into a sorted vector use into_sorted_vec.

1 Like
extern crate itertools;

fn main() {
    use std::collections::BinaryHeap;

    let heap = BinaryHeap::from(vec![1, 8, 5, 6, 3, 4, 0, 9, 7, 2]); // max heap from a vec
    assert_eq!(
        itertools::unfold(heap, |heap| heap.pop()).collect::<Vec<_>>(),
        vec![9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
    );
}

Edit: Or, if you'd prefer, you can use unfold(heap, BinaryHeap::pop).

2 Likes

It was to me too, because "heap order" is pretty useless.

Whereas the sorted iterator is great, especially with combinators like take that can take full advantage of its laziness.

It was surprising to me, because:

  • BinaryHeap is used when you want to get the value in the specified (max-first) order.
  • Underlying vector is kind of implementation details, not an interest to the library user.

As a matter of fact, in the current implementation, heap.into_iter() is almost identical to heap.into_vec().into_iter(). The only benefit, for me, is to save the number of typed characters....

1 Like

into_sorted_vec is min-first order. I could call reverse() or something but inelegant.
For simple app, I could use that technique and get the work done, though.

Thanks, It's elegant.

To get a sorted list, I think it would be better to just do a normal vector sort. It seems to me that filling and draining a binary heap is going to cause a lot more memory movements and comparisons than a plain sort. (Although I don't have a reference for that.)

The purpose of BinaryHeap is to get elements in a sorted manner. The current binary heap iterator implementation is useless if you must use external vector sorting.

1 Like

pub fn iter(&self) -> Iter

Returns an iterator visiting all values in the underlying vector, in arbitrary order.

The lib doc says ‘arbitrary order’.
Fortunately, the concept of ‘arbitrary order’ includes ‘max-first order’.
So I suppose fixing this behavior is backward compatible :blush:

5 Likes

Do you know how a binary heap works? Internally it is only partially ordered. The only way to get things out in order is to drain it. So unless the iterator drains the heap (i.e. empties it, which is a destructive operation), an iterator cannot give you the items in order.

This doesn't make the heap useless because what a heap is good at is when remove and add operations are mixed or interleaved. If you're just doing 20xAdd then 20xRemove, then the heap isn't the best data structure for the job.

2 Likes

Okay, to add to this. Maybe they could make the drain method return things in order, because that is destructive, which would make the OP happy. Perhaps there's a good reason not to do that ... I'm not sure. However, still pushing many items and then popping many items in a BinaryHeap will be less efficient than just putting them in a vector and sorting them, because each of those pushes causes readjustment of the binary tree. However it could be as efficient as a heapsort if the data is loaded all at once, and only one readjustment pass is made. It appears that BinaryHeap::from(vec) does that. So if it is being used primarily for sorting a batch of items then that's what's needed.

2 Likes

If you definitely need to sort the whole thing, yes.

But the nice thing about iterators is that they can be lazy. If you just want the top 10 things that match a (likely) predicate, say, something like .into_sorted_iter().filter(...).take(10) can be substantially faster than sorting the whole set.

4 Likes

Isn't OP talking about into_iter? That is a destructive operation just as much as drain is, right?

Personal opinion -- there are two things that concern me about the proposal here:

  1. If into_iter is changed, it will give a different order from iter, which really can't do an in-order traversal.
  2. I don't want my existing BinaryHeap::into_iter calls to be slowed down because someone else decided to care about preserving an ordering that means nothing to me. Especially when there are perfectly clear and fast ways to do what is wanted. Does BinaryHeap offer another fast way to construct an partial-ordered owned iterator?

I think a healthy compromise would be to add an into_heap_order_iter or something instead of rewriting the current one.

1 Like

Sorry, I supposed iter() and into_iter() is the almost the same thing. But in reality the former takes &self and the latter takes self.

My opinion is the both should take self and hence destructive.

It's a implementation detail of BinaryHeap. And the use case for BinaryHeap is a (blackbox) priority queue. It should hide the detail, in my opinion.

I don't think it's a well-thought API....

Yes it's stable public APIs!

I'd suggest you call heap.into_vec().into_iter() instead.

    /// Consumes the `BinaryHeap` and returns the underlying vector
    /// in arbitrary order.
    ///
    #[stable(feature = "binary_heap_extras_15", since = "1.5.0")]
    pub fn into_vec(self) -> Vec<T> {
        self.into()
    }
1 Like

You're right, I missed that. Thanks for the correction.

1 Like

You are right, main usage of BinaryHeap is push() and pop(). I actually only used in that way:

Interesting point! You are right.

Maybe:

  • Add orthogonal, new one:
    -- fn heap_iter(self) // destructive; moved
    -- fn heap_into_iter(self) // destructive; moved
    -- fn heap_drain(&mut self) // destructive; you can use the original heap later.
  • (Optionally) Deprecate old ones.

[Edit] But I hesitate to do so because binary_heap.rs is already pretty big. More than 1200 lines!