How to delete some element when traversing a BTreeMap?

I have a BTreeMap<String,Option<()>> named map, which has some Some(()) and None, I want to pick out all the None and delete them.
I instinctively use method retain to do, but found retain was under experienment so that I can't use it in stable Rust. Eventually I can do nothing but using unsafe code to solve it. Code like this:

unsafe{
    let ptr:*mut _=&mut map;
    for (k,v) in (&mut *ptr).iter(){
        if v.is_none(){
             (&mut *ptr).remove(&*k);
             //danger: using value after dropped
             //println!("{}",k);
        }
    }
}

But I want to know is there a way to do it without using unsafe code?

If you can wait until tomorrow, Rust 1.53 will have stable BTreeMap::retain.

10 Likes

Really? If only one day of course I can wait.

This code is unsound. The problem what you're trying to sidestep is called iterator invalidation. It basically means you can't modify the collection while iterating it, in most languages. In Rust compile error, in Java throw ConcurrentModificationException, and in C++ UB. But since you've silent the compile error by unsafe block it's UB here.

Collections can move their elements when it's modified. Vec<T> reallocate, HashMap<K, V> performs robin hood probing(with current implementation), and BTreeMap<K, V> merge and split nodes. But iterators holds some pointers to those internal data structure, so they need to be invalidated on modification.

8 Likes

I got it, but I have a question, I have tested this code on my machine and it worked correctly, did you mean when map contains too much elements it may occur UB? If so, what is the correct solution?

It may trigger UB if the .remove() triggers the actual node merge. Of course the correct solution would be to use .retain() tomorrow. Without it there's not much things to do, usually it's to store clones of keys to be removed and remove them after the iteration is done.

To be clear, it is immediately UB when you create any &mut T that aliases other references. You might not see any visible effects of that right away, or maybe only under certain circumstances, but it's always a bug, full stop.

6 Likes

It did not work correctly, it appeared to work correctly. UB never counts as "working correctly", and you can not rely on it. "It works on my machine" is a spectacularly bad metric of code soundness and correctness.

2 Likes

Yes you're quite right, but I think you might misunderstand me. what I mean indeed is that it just appeared to work correctly, thus I asked Hyeonu to know whether the error wouldn't occur easily if the capacity of map is small. Any logical mistake is a hidden danger, isn't it?

1 Like

Sticking to a single thread, I think you cannot cause the average operating system to complain about the undefined behaviour unless the map has multiple nodes, which means it must have at least 12 key-value pairs at some point. But once you gathered 12 of them, they'll quite easily condemn your process to the death penalty, simply by simulating that every key needs to be deleted.

1 Like

There really should only be two cases: 100% safe, or not safe. 99.9% correct is not correct, because you never know when that 0.1% is going to... say... turn off the safeties in a nuclear power plant.

Now, in real life, especially in many non-mission-critical commercial usage, "good enough" is good enough... when in doubt, reboot! In these cases, you can tolerate being wrong or even crashing.

"whether the error wouldn't occur easily" is a statement based on the fact that you can tolerate the error, even if it doesn't occur "easily". In some applications, you definitely cannot tolerate any error, which may simply cause lives. For example, a memory error that is hard to trigger, but just so happens once and then caused a train wreck.

Usually people tolerate being wrong because being 100% correct takes a lot of effort and kills many brain cells. Rust is a serious attempt to be 100% correct while at least being manageable and not too limiting.

Alright, maybe my expression ability is bad. I don't mean I can tolerate any chance to make an error, I just want to know whether the reason why I didn't occur the error is the capacity was too small. Besides, I am obsessive-compulsive so I can't tolerate any bit of possible error and that's why I raised this topic. :rofl:

1 Like

Thanks for the example. The playground indeed occurred an error but the error is not a panic but a weird error appeared to occurred in a script?

Indeed. The panic would be much better, since it is controlled and defined. Here, the program just crashed.

Curious how you arrived at number 12 for this explanation. I would like to leanr about that pleas :slight_smile:

1 Like

It's one more than the CAPACITY of a tree node.

As long as a map has a single node, it's basically a fixed size preallocated array and you cannot address any invalidated memory (you might be able to violate assertions though).

Add one more element, and the tree splits into 3 nodes, a root and two children. When you then remove one or two(*) keys, the tree must collapse back into a single node. Therefore, an iterator might have a reference to a deleted node. Obviously, without unsafe code, there's no way to delete anything while the iterator is around.

(*) In this case, both inserting and removing keys in ascending order, you need to remove two elements. Initially, the root has 1 element, the first child 6 elements and the second child 5 elements. After removing the first key, the tree still has 3 nodes: a root with 1 element, and two children with 5 elements each. After removing the second key, the tree collapses - the elements in the second child are appended to the first child, the first child becomes the root, and the other two nodes are deallocated.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.