I'm wondering about the implementation of
BTreeSet::from_iter(), which collects the values into a Vec and then sorts them with a stable sort and then iterates over that to construct the BTreeSet.
It seems terribly wasteful in cases where the iterator is already sorted, as happens e.g. in Sub, where the ordered difference between two BTreeSets is collected into another BTreeSet.
It's hard to imagine the standard library leaving on the floor such an obvious optimization as to call
BTreeMap::bulk_build_from_sorted_iter when it knows it has a sorted iterator. Presumably there is a trick that makes `collect| fast. What is it?
It's not so strange to me that nobody noticed this case, but you're welcome to send a pull request if you'd like to improve it! I think the
BitAnd intersection can do the same thing.
Well, but checking for a sorted iterator would require traversing it. There are two problems with that:
- It's linear, so it might approximately double (or at least significantly increase) the running time of
collect if the iterator is long.
- You would have to collect it into a data structure anyway because
from_iter() needs to consume the elements, so if the elements didn't go into some sort of allocation, there would be no way to actually build the set after checking for sortedness.
So I don't think the general
FromIterator impl can be reasonably expected to perform a check for sorted input and not allocate.
Note that this function is a fairly recent PR to begin with, it already is the trick that makes
collect fast… because:
- No need to look up the tree node for each inserted key.
- Requires fewer allocations because it creates a compact tree: fewer tree nodes with most of them filled up to capacity. The resulting map is most likely faster for lookups but actually slower if you then need to insert more keys into it, but that's a different topic.
On the other hand, the use of
from_iter slows us down if the iterator yields only a few elements. Up to 11 elements, there's only one tree node to be allocated, and the price of allocating an intermediate vector needs to be offset by sorting smartly to beat the linear searches in tree nodes. Certainly if the iterator yields just one or two elements, you pay for that extra allocation (compared to the old naive implementation). That may be what the last remark in the PR reports.
But anyways, if the caller knows the keys are ascending, like all the BTree iterators are, using
bulk_build_from_sorted_iter is low hanging fruit.
You could copy or base a PR on this:
If you don't, I'll make a PR myself, but then it's probably going to lie untouched for months or years. I don't know why, must be on some blacklist or something.
Did you end up submitting this pull request?
Great, and already merged! Thanks!