Removing range of elements from BTreeMap

Hello,

Surely I've missed something while reading the docs, but I cannot find the way to efficiently remove a range of elements from a BTreeMap.

In C++, I would get an iterator to the first element to remove, and an iterator to the first element not to remove, and then call std::map::erase, whose complexity is logarithmic in the number of elements in the map plus linear in number of elements to remove.

How would I proceed with BTreeMaps with a similar complexity? I could split_off the BTreeMap at the beginning of my range, and a second time at the end of my range, and then append the original BtreeMap to the one that was split off the second time, but this is both more complicated than map::erase(range_of_keys) and also if I read correctly, linear in the number of remaining elements, rather than in the number of deleted ones?

Any help would be greatly appreciated. Thanks :smiley:

2 Likes

I don't think there's a better way to do this in the standard BTreeMap. Adding a "splice" or "drain" method would be very useful. I believe the main reason they don't exist is that nobody with the time and expertise has stepped up to add them yet.

2 Likes

Perhaps even worse than you might hope: linear in the number of remaining elements in both remaining parts. It rebuilds the whole tree and doesn't reuse the first part issue #34666.

If you would happen to know that the doomed elements are near the start of the sequence, you could use drain_filter (in nightly) and mem::forget it when you reach the first element not to remove, but I'm not sure that use of mem::forget is guaranteed to continue to have defined behaviour. And even then, it's only linear with the number of elements deleted, and actually worse.

1 Like

I see. Thank you for the answer. This is a bit unfortunate for my use case though :smile:

I see that the std one has drain_filter as an unstable method (and retain as a nightly, recently added, unstable method), but it still has to iterate on the whole map even if just a subset of the keys must be filtered.

Do you know of a non-standard BTreeMap that would offer efficient removal of a range?

Otherwise, I could take a look at the btree module and see what I can do, but honestly I'd rather not dive into modifying tree data structures right now :sweat_smile:

The main obstacle for a drain with a range, is something C++ doesn't have to deal with: the possibility to mem::forget anything, including a draining iterator. The map must be kept in a valid state while elements are iterated, even though the drain method takes an exclusive reference to the map (no visitors allowed on the construction site). We can't simply let the iterator return a bunch of key/value pairs, then delete entire tree nodes and subtrees, because if the draining iterator is forgotten in the meanwhile, the drained key/value pairs are still in the map.

Writing this made me realize there is a way: drain could create a temporary new root, transfer the tree nodes and subtrees with deleted elements to the temporary tree, fix up to the real tree as if the whole drain is already done, then hang the temporary tree in the drain iterator for it to iterate the detached key/value pairs, and finally (in the iterator's drop handler) delete the detached tree. There a mem::forget will merely leak keys, values and some tree nodes, but that's the normal price you pay for using mem::forget.

However, simply gift wrap that thing and you end up with a split_off that takes a range, instead of a single bound. If anyone still wants a drain, simply .into_iter() that, just like any .drain() without range.

3 Likes

I would definitely discourage this use.

One could consider making a new .and_keep_the_rest_of_them() method (taking self) on the drain iterator to enable this more obviously...

I haven't tried to write an example, but it still seems like a clumsy way to accomplish the goal: the predicate needs to convey the fact it saw an end key to the world outside that is consuming the iterator. I'd much rather let the filter return a Drain/Keep/Stop enum instead of a boolean. Regardless of that, for symmetry's sake, you also want an efficient way to specify a starting point.

So maybe we'd rather want to be able to range_mut(x..y).drain_filter(predicate). But in any case, due to the existence of mem::forget, I don't think drain_filter is ever going to be more scalable than linear with the number of elements drained.

1 Like

That sounds nice too. Could also add DrainRemainder.

Presumably that change would be wanted on all the drain_filters, so might be worth bringing up in

This is solved with a technique amusingly called:

For that reason I suggest staying away from mem::forget on draining iterators. They don't have to guarantee that the map remains valid, only that it's not unsafe.

1 Like

For instance in Vec, by leaving behind a buffer and zero length. But I bet that's not the only case where Vec keeps an unused buffer around. In BTreeMap, I did not find a way to to this without introducing some hereto unknown state.

PS I suppose you can always temporarily replace the entire state of the container with a default state, transfer that state to your drain iterator, and shove it all back again in the end. I guess I was looking for a slightly more subtle way.

What "it" do you mean here? The only thing I see is the remaining map. If the map is not valid, how can you guarantee that using it is safe? Only by putting a burden on all the other & future code using maps, says I. Perhaps that burden pays off elsewhere, in having better invariants or code checking them, but I don't believe you can just PPYP your way out of anything.

The BTreeMap. Leaving it empty and leaking memory if drain's destructor fails to run is safe (by Rust's definition), even if it's not a desirable behavior from user perspective.

Going back to the original question, and related topics, I see these related methods:

  • drain(range), equivalent to Vec::drain(range) if you consider a Vec<T> to be a BTreeMap<usize, T> with consecutive keys.
  • drain(), equivalent to HashMap::drain(), a shortcut to mem::take(&mut self).into_iter(). This is equivalent to Vec::drain(..) but not quite equivalent to BTreeMap::drain(..).
  • split_off(range): a drain(range) that returns a new map instead of merely iterating it. This is not just a generalization of drain(range), because a map has more constraints to comply with than an iterator. It could be accomplished by a Drain::into_map method, but that's also not ideal: the iterator Drain descends to the first leaf node to start iteration, and into_map would need to back up to the root. The existing split_off(key) is equivalent to split_off(key..). A desired variant is split_off((Excluded(key), Unbounded)).

The problem now is getting the names right. Both of the names above are overloaded thus impossible. split_off(key) is stable I don't see an alternative to split_off_range(range). Should it be drain_range(range), which also seems more fair against drain_filter? And/or should we forget about a separate drain()?

actually, no, HashMap::drain also “keeps the allocated memory for reuse”. Something that’s not possible for BTreeMap, so a paramter-free drain operation doesn’t make too much sense exactly because you could just use mem::take + into_iter.

Oh right, that makes sense. In theory, BTreeMap::drain could also recycle an allocated root node instead of deallocating it in the end, but it would require some effort to save out a single, constant size allocation.

There's also BinaryHeap::drain(), which really is simply a wrapper for Vec::drain(..). But it's not a strong argument in favour of a separate BTreeMap::drain() either, because there is no BinaryHeap::drain_x(x) and I can't imagine any argument to pass.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.