Question about BTreeSet implementation

I'm wondering about the implementation of BTreeSet::from_iter(), which collects the values into a Vec and then sorts them with a stable sort and then iterates over that to construct the BTreeSet.

It seems terribly wasteful in cases where the iterator is already sorted, as happens e.g. in Sub, where the ordered difference between two BTreeSets is collected into another BTreeSet.

It's hard to imagine the standard library leaving on the floor such an obvious optimization as to call BTreeMap::bulk_build_from_sorted_iter when it knows it has a sorted iterator. Presumably there is a trick that makes `collect| fast. What is it?

It's not so strange to me that nobody noticed this case, but you're welcome to send a pull request if you'd like to improve it! I think the BitAnd intersection can do the same thing.

Well, but checking for a sorted iterator would require traversing it. There are two problems with that:

  1. It's linear, so it might approximately double (or at least significantly increase) the running time of collect if the iterator is long.
  2. You would have to collect it into a data structure anyway because from_iter() needs to consume the elements, so if the elements didn't go into some sort of allocation, there would be no way to actually build the set after checking for sortedness.

So I don't think the general FromIterator impl can be reasonably expected to perform a check for sorted input and not allocate.

2 Likes

Note that this function is a fairly recent PR to begin with, it already is the trick that makes collect fast… because:

  • No need to look up the tree node for each inserted key.
  • Requires fewer allocations because it creates a compact tree: fewer tree nodes with most of them filled up to capacity. The resulting map is most likely faster for lookups but actually slower if you then need to insert more keys into it, but that's a different topic.

On the other hand, the use of bulk_build_from_sorted_iter in from_iter slows us down if the iterator yields only a few elements. Up to 11 elements, there's only one tree node to be allocated, and the price of allocating an intermediate vector needs to be offset by sorting smartly to beat the linear searches in tree nodes. Certainly if the iterator yields just one or two elements, you pay for that extra allocation (compared to the old naive implementation). That may be what the last remark in the PR reports.

But anyways, if the caller knows the keys are ascending, like all the BTree iterators are, using bulk_build_from_sorted_iter is low hanging fruit.

You could copy or base a PR on this:

If you don't, I'll make a PR myself, but then it's probably going to lie untouched for months or years. I don't know why, must be on some blacklist or something.

Did you end up submitting this pull request?

It's here.

Great, and already merged! Thanks!

interesting information