Removing range of elements from BTreeMap

zrk · November 17, 2020, 9:28pm

Hello,

Surely I've missed something while reading the docs, but I cannot find the way to efficiently remove a range of elements from a BTreeMap.

In C++, I would get an iterator to the first element to remove, and an iterator to the first element not to remove, and then call std::map::erase, whose complexity is logarithmic in the number of elements in the map plus linear in number of elements to remove.

How would I proceed with BTreeMaps with a similar complexity? I could split_off the BTreeMap at the beginning of my range, and a second time at the end of my range, and then append the original BtreeMap to the one that was split off the second time, but this is both more complicated than map::erase(range_of_keys) and also if I read correctly, linear in the number of remaining elements, rather than in the number of deleted ones?

Any help would be greatly appreciated. Thanks

mbrubeck · November 17, 2020, 9:56pm

I don't think there's a better way to do this in the standard BTreeMap. Adding a "splice" or "drain" method would be very useful. I believe the main reason they don't exist is that nobody with the time and expertise has stepped up to add them yet.

ssomers · November 17, 2020, 11:12pm

Perhaps even worse than you might hope: linear in the number of remaining elements in both remaining parts. It rebuilds the whole tree and doesn't reuse the first part issue #34666.

If you would happen to know that the doomed elements are near the start of the sequence, you could use drain_filter (in nightly) and mem::forget it when you reach the first element not to remove, but I'm not sure that use of mem::forget is guaranteed to continue to have defined behaviour. And even then, it's only linear with the number of elements deleted, and actually worse.

zrk · November 17, 2020, 11:13pm

I see. Thank you for the answer. This is a bit unfortunate for my use case though

I see that the std one has drain_filter as an unstable method (and retain as a nightly, recently added, unstable method), but it still has to iterate on the whole map even if just a subset of the keys must be filtered.

Do you know of a non-standard BTreeMap that would offer efficient removal of a range?

Otherwise, I could take a look at the btree module and see what I can do, but honestly I'd rather not dive into modifying tree data structures right now

ssomers · November 18, 2020, 3:43pm

The main obstacle for a drain with a range, is something C++ doesn't have to deal with: the possibility to mem::forget anything, including a draining iterator. The map must be kept in a valid state while elements are iterated, even though the drain method takes an exclusive reference to the map (no visitors allowed on the construction site). We can't simply let the iterator return a bunch of key/value pairs, then delete entire tree nodes and subtrees, because if the draining iterator is forgotten in the meanwhile, the drained key/value pairs are still in the map.

Writing this made me realize there is a way: drain could create a temporary new root, transfer the tree nodes and subtrees with deleted elements to the temporary tree, fix up to the real tree as if the whole drain is already done, then hang the temporary tree in the drain iterator for it to iterate the detached key/value pairs, and finally (in the iterator's drop handler) delete the detached tree. There a mem::forget will merely leak keys, values and some tree nodes, but that's the normal price you pay for using mem::forget.

However, simply gift wrap that thing and you end up with a split_off that takes a range, instead of a single bound. If anyone still wants a drain, simply .into_iter() that, just like any .drain() without range.

scottmcm · November 18, 2020, 6:46pm

I would definitely discourage this use.

One could consider making a new .and_keep_the_rest_of_them() method (taking self) on the drain iterator to enable this more obviously...

ssomers · November 18, 2020, 8:21pm

I haven't tried to write an example, but it still seems like a clumsy way to accomplish the goal: the predicate needs to convey the fact it saw an end key to the world outside that is consuming the iterator. I'd much rather let the filter return a Drain/Keep/Stop enum instead of a boolean. Regardless of that, for symmetry's sake, you also want an efficient way to specify a starting point.

So maybe we'd rather want to be able to range_mut(x..y).drain_filter(predicate). But in any case, due to the existence of mem::forget, I don't think drain_filter is ever going to be more scalable than linear with the number of elements drained.

scottmcm · November 18, 2020, 9:29pm

That sounds nice too. Could also add DrainRemainder.

Presumably that change would be wanted on all the drain_filters, so might be worth bringing up in

github.com/rust-lang/rust

Tracking issue for Vec::extract_if and LinkedList::extract_if

opened 11:26PM - 14 Jul 17 UTC

Gankra

A-collections T-libs-api B-unstable C-tracking-issue Libs-Tracked

Feature gate: `#![feature(extract_if)]` (previously `drain_filter`) This is a… tracking issue for `Vec::extract_if` and `LinkedList::extract_if`, which can be used for random deletes using iterators. ### Public API ```rust pub mod alloc { pub mod vec { impl<T, A: Allocator> Vec<T, A> { pub fn extract_if<F>(&mut self, filter: F) -> ExtractIf<'_, T, F, A> where F: FnMut(&mut T) -> bool, { } } #[derive(Debug)] pub struct ExtractIf<'a, T, F, #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global> where F: FnMut(&mut T) -> bool, {} impl<T, F, A: Allocator> Iterator for ExtractIf<'_, T, F, A> where F: FnMut(&mut T) -> bool, { type Item = T; fn next(&mut self) -> Option<T> {} fn size_hint(&self) -> (usize, Option<usize>) {} } impl<T, F, A: Allocator> Drop for ExtractIf<'_, T, F, A> where F: FnMut(&mut T) -> bool, { fn drop(&mut self) {} } } pub mod collections { pub mod linked_list { impl<T> LinkedList<T> { pub fn extract_if<F>(&mut self, filter: F) -> ExtractIf<'_, T, F> where F: FnMut(&mut T) -> bool, { } } pub struct ExtractIf<'a, T: 'a, F: 'a> where F: FnMut(&mut T) -> bool, {} impl<T, F> Iterator for ExtractIf<'_, T, F> where F: FnMut(&mut T) -> bool, { type Item = T; fn next(&mut self) -> Option<T> {} fn size_hint(&self) -> (usize, Option<usize>) {} } impl<T, F> Drop for ExtractIf<'_, T, F> where F: FnMut(&mut T) -> bool, { fn drop(&mut self) {} } impl<T: fmt::Debug, F> fmt::Debug for ExtractIf<'_, T, F> where F: FnMut(&mut T) -> bool, { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {} } } } } ``` ### Steps / History - [x] Implementation: #43245 - [x] removed drain-on-drop behavior, renamed to extract_if - #104455 - https://github.com/rust-lang/libs-team/issues/136 - [ ] Stabilization PR ### Unresolved Questions - What should the method be named? - Should `extract_if` accept a `Range` argument? - Missing `Send`+`Sync` impls on linked list's ExtractIf, [see comment](https://github.com/rust-lang/rust/issues/43244#issuecomment-1268857915) See https://github.com/rust-lang/rust/issues/43244#issuecomment-641638196 for a more detailed summary of open issues.

kornel · November 19, 2020, 1:34pm

This is solved with a technique amusingly called:

https://cglab.ca/~abeinges/blah/everyone-poops/#pre-pooping-your-pants

For that reason I suggest staying away from mem::forget on draining iterators. They don't have to guarantee that the map remains valid, only that it's not unsafe.

ssomers · November 19, 2020, 3:12pm

For instance in Vec, by leaving behind a buffer and zero length. But I bet that's not the only case where Vec keeps an unused buffer around. In BTreeMap, I did not find a way to to this without introducing some hereto unknown state.

PS I suppose you can always temporarily replace the entire state of the container with a default state, transfer that state to your drain iterator, and shove it all back again in the end. I guess I was looking for a slightly more subtle way.

What "it" do you mean here? The only thing I see is the remaining map. If the map is not valid, how can you guarantee that using it is safe? Only by putting a burden on all the other & future code using maps, says I. Perhaps that burden pays off elsewhere, in having better invariants or code checking them, ~~but I don't believe you can just PPYP your way out of anything.~~

kornel · November 19, 2020, 5:11pm

The BTreeMap. Leaving it empty and leaking memory if drain's destructor fails to run is safe (by Rust's definition), even if it's not a desirable behavior from user perspective.

ssomers · January 14, 2021, 11:54am

Going back to the original question, and related topics, I see these related methods:

drain(range), equivalent to Vec::drain(range) if you consider a Vec<T> to be a BTreeMap<usize, T> with consecutive keys.
drain(), equivalent to HashMap::drain(), a shortcut to mem::take(&mut self).into_iter(). This is equivalent to Vec::drain(..) but not quite equivalent to BTreeMap::drain(..).
split_off(range): a drain(range) that returns a new map instead of merely iterating it. This is not just a generalization of drain(range), because a map has more constraints to comply with than an iterator. It could be accomplished by a Drain::into_map method, but that's also not ideal: the iterator Drain descends to the first leaf node to start iteration, and into_map would need to back up to the root. The existing split_off(key) is equivalent to split_off(key..). A desired variant is split_off((Excluded(key), Unbounded)).

The problem now is getting the names right. Both of the names above are overloaded thus impossible. split_off(key) is stable I don't see an alternative to split_off_range(range). Should it be drain_range(range), which also seems more fair against drain_filter? And/or should we forget about a separate drain()?

steffahn · January 14, 2021, 1:32pm

actually, no, HashMap::drain also “keeps the allocated memory for reuse”. Something that’s not possible for BTreeMap, so a paramter-free drain operation doesn’t make too much sense exactly because you could just use mem::take + into_iter.

ssomers · January 14, 2021, 2:25pm

Oh right, that makes sense. In theory, BTreeMap::drain could also recycle an allocated root node instead of deallocating it in the end, but it would require some effort to save out a single, constant size allocation.

There's also BinaryHeap::drain(), which really is simply a wrapper for Vec::drain(..). But it's not a strong argument in favour of a separate BTreeMap::drain() either, because there is no BinaryHeap::drain_x(x) and I can't imagine any argument to pass.

system · April 14, 2021, 2:25pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Merging elements in BTreeMap help	13	2855	April 23, 2022
How to delete some element when traversing a BTreeMap? help	16	1757	November 5, 2021
Get a random element from BTreeMap	5	1724	January 12, 2023
Popping maps and sets? help	11	2709	January 12, 2023
Get slice of BTreeMap or similar solution help	3	1670	January 12, 2023

Removing range of elements from BTreeMap

Related topics