Merging elements in BTreeMap

Hello,

I'd like to merge consecutive elements that satisfy a predicate in a BTreeMap.

In C++, I can basically go this way:

auto current = map.begin();
for (auto next = std::next(current); next != map.end(); /* no increment */) {
    if (predicate(*current, *next)) {
        current->absorb(*next);
        next = map.erase(next);
    } else {
        current = next;
        next = ++current;
    }
}

Of course, transposing the algorithm directly won't fly with Rust's borrow checking rules (also, I wager that pointer invalidation rules are more restrictive than C++'s, so even without borrow checker I'd probably be stuck?).

I thought of maintaining a deletion list populated by a first iteration, and then used to delete the merged elements from the map, but this poses multiple problems:

  1. To be able to iterate on multiple elements at the same time (for comparisons), I'm restricted to shared references. This means I cannot apply current.absorb() as this requires an exclusive reference to current, unless I introduce interior mutability for current, which I'd rather not to.
  2. An element of the map can absorb several of its consecutive elements, which is expressed naturally in the C++ by deleting next repeatedly while keeping current (while predicate remains true). This means I need to remember the "current" element I'm merging in, while iterating on the candidates for a merge.

I'm kinda stumped right now. Any idea how I could do this reasonably efficiently using BTreeMap?

A quick search in the C++ stdlib doesn't show anything for absorb. Is that a method you wrote, and if so, what does it do?

yes absorb and predicate are methods I wrote.

absorb logically takes data from an element and merges it in its receiver (think merging intervals).

Its semantics isn't really relevant to my question, apart from the fact that it needs to mutate its receiver (fn absorb(&mut self, other: &Element)) and that I want to use it when the predicate (fn predicate(element: &Element, other: &Element) -> bool) is true.

1 Like

Well, I did a thing:

fn merge(map : &mut BTreeMap<Key, Value>) {
      struct Merge {
            dest: Key,
            src: Key,
        }

        let mut merge_list = Vec::new();

        let mut current = map.iter().peekable();
        let mut next = current.clone().skip(1);

        loop {
            match (current.peek(), next.next()) {
                (
                    Some((
                        &current_key,
                        current_element,
                    )),
                    Some((
                        &next_key,
                        next_element,
                    )),
                ) => {
                    if predicate(current_element, next_element) {
                        merge_list.push(Merge {
                            dest: current_key,
                            src: next_key,
                        });
                    } else {
                        current.next();
                    }
                }
                _ => break,
            }
        }

        for Merge { dest, src } in merge_list {
            let src_element = map.get(&src).expect("Not found");
            map.get_mut(&dest).expect("Not found").absorb(&src_element);
            map.remove(&src);
        }
}

Note that both Key and Value are Copy which makes things easier here.

I wonder how it compares to the C++, efficiency-wise (I will probably have to hoist the Vec to the caller of merge to save on allocation, for starters...); if any of you can find a more efficient or simpler solution I'd be delighted.

Here's a simpler solution (Playground):

fn merge_consecutive<K, V, P, M>(map: BTreeMap<K, V>, mut predicate: P, mut absorb: M) -> BTreeMap<K, V>
    where
        K: Ord,
        P: FnMut(&(K, V), &(K, V)) -> bool,
        M: FnMut((&K, &mut V), (&K, &mut V)),
{
    if map.len() < 2 {
        return map;
    }

    let mut iter = map.into_iter();
    let mut curr = vec![iter.next().unwrap()];
    let mut next = iter.next().unwrap();

    loop {
        let curr_kv = curr.last_mut().unwrap();
        if predicate(curr_kv, &next) {
            absorb((&curr_kv.0, &mut curr_kv.1), (&next.0, &mut next.1));
        } else {
            curr.push(next);
        }
        match iter.next() {
            Some(kv) => next = kv,
            None => break
        }
    }
    
    curr.into_iter().collect()
}
2 Likes

I am not sure if it would work, but possibly for efficiency you could look at retain to remove the absorbed elements.

It's because the std::map is a BST which doesn't invalidates iterators on non-parallel modifications. In 2013 google releases B-tree based ordered map which is faster AND uses less memory compared to the std::map. But it mentions that unlike the standard STL containers, modifying a C++ B-tree container invalidates all outstanding iterators on that container.

And the Abseil, modern C++ standard library for google projects, still provides only B-tree based implementations for ordered collections.

Thanks! It is simpler! I must note that, performance-wise, we need to reallocate the entire map with this method. Still thank you for the alternative simpler implementation, would be interesting if I benchmark it against mine, and can be useful in tests anyway.

I guess I could perform the merge operations in the second step by using retain rather than explicitly getting the elements of the map over and over again? I would have to remember the src Element in a Merge command, but fortunately in my case it works because Element is copy.

As to wether this improves performance, I guess it depends on the number of merge operations.

The current complexity is O(n) (first step) and k*log(n), with n the number of elements in the map and k the number of merge operations.

If I'm using retain I must iterate on all the nodes, but I guess the removal in retain will be amortized constants, so O(n) for step 2 too? I really wish the complexity of operations was described in the documentation for collections :stuck_out_tongue:... Although I guess the constants factor here matter a lot, so I would need to benchmark it anyway. Your way also has the advantages of removing the .expect("key not found")

Yes there is a trade-off here, the std::map cannot be made more efficient due to the iterator/reference invalidation rules that are part of its API. By large it looks like the world moved to BTrees, I'm just wondering which container is better for the operation in my original post (but maybe the iteration times are so drastically different that whatever advantage a BST provides on this use case is dwarfed the second we iterate on it).

Additionally, I'm not sure how we could provide a Rust API to allow the "efficient" deletion of nodes even with an hypothetical Rust BST. The "iterator-invalidation" pattern of the C++ strikes me as fundamentally unsafe, at least unless we're using "runtime-checked iterators" (which I guess would defeat the point, performance-wise, and still move the error at runtime rather than compile time).

Finally I wonder if I could implement a more efficient way in abseil's BTree, based on erase_if + find in a single pass

I tried really hard (for about an hour) and I couldn't come up with a solution that works purely in-place. At this point I'm inclined to think that it's impossible, since BTreeMap doesn't store every node in a separate allocation, so deleting from the tree causes iterator invalidation, unlike C++'s std::map, which is usually implemented as a red-black tree.

Alternatives included cloning the keys and using the Entry API for manipulating individual key-value pairs, but that would have, again, involved allocation upon Clone-ing each individual key. So I figured it's better to do a one-pass by-value traversal and let the map handle reallocation itself, because that potentially saves most of the allocations (for the very same reason – not every node needs a separate allocation).

1 Like

@zrk Update: I have made a quick Criterion benchmark. With maps of two different key types (std::String vs. a custom-made small, inline string without heap-allocation), the results are the following:

  • merge (std::String) time: [15.190 ms 15.251 ms 15.313 ms]
  • merge_consecutive (std::String) time: [16.821 ms 16.934 ms 17.049 ms]
  • merge (SmallAsciiString) time: [10.141 ms 10.201 ms 10.263 ms]
  • merge_consecutive (SmallAsciiString) time: [16.558 ms 16.665 ms 16.774 ms]

I.e., your version is somewhat faster as long as keys are cheap to clone, and the advantage quickly fades away as soon as there is at least one heap allocation involved in the cloning of a key. One thing to note that I tested the two algorithms with randomly-generated data, so how much merging took place wasn't controlled (there hopefully was some at least, because the keyspace is pretty small – keys are merged based on their length, which is always between 4 and 8 bytes.)

Also note that your version had a bug which made the .get_mut().expect() fail and crash the benchmark. current.next() isn't what you want, because that goes to the key after the last non-equivalent (non-duplicate) key, and not to the key following the last deduplicated key. I had to fix the code in the following way:

pub fn merge<K, V, P, M>(map: &mut BTreeMap<K, V>, mut predicate: P, mut absorb: M)
    where
        K: Clone + Ord,
        P: FnMut((&K, &V), (&K, &V)) -> bool,
        M: FnMut((&K, &mut V), (&K, V)),
{
    let mut merge_list: Vec<(K, K)> = Vec::new();
    let mut current = Either::Left(map.iter()).peekable();
    let mut next = current.clone().skip(1);

    loop {
        match (current.peek(), next.next()) {
            (Some(&curr_kv), Some(next_kv)) => {
                if predicate(curr_kv, next_kv) {
                    merge_list.push(((*curr_kv.0).clone(), (*next_kv.0).clone()));
                } else {
                    current = Either::Right(map.range((*next_kv.0).clone()..)).peekable();
                }
            }
            _ => break,
        }
    }

    for (dest, src) in merge_list {
        let src_element = map.remove(&src).expect("Not found (src)");
        let dst_element = map.get_mut(&dest).expect("Not found (get_mut)");
        absorb((&dest, dst_element), (&src, src_element));
    }
}

This makes the code even more complex, so even though it might be faster for small keys, I wouldn't necessarily recommend it, because it's getting pretty convoluted at this point.

For future reference, the full benchmark can be found in this repo.

2 Likes

Thank you immensely for your time and dedication to my issue!

I think your benchmark repository is useful as a starting point for code patterns using BTreeMap. For completeness, I must specify that in real code, the vector in fn merge is reused between calls to fn merge, to reduce allocations. The same can be done for the vector in fn merge_consecutive, but unfortunately we cannot reuse nodes allocation in the BTreeMap (yet!).

I have been thinking about the problem throughout the day, and your solution made me realize that in my case, just turning the map into the vector is a perfectly fine solution: because in the algorithm I'm using, after the merge_consecutive operation, we no longer perform insertions in the map, performance will be somewhat similar (or even better due to better cache locality) if I just turn the map into a vector while performing the merge_consecutive operation, and then lookup my elements using binary search on the Vec!

pub fn merge_to_vec<K, V, P, M>(map: BTreeMap<K, V>, mut predicate: P, mut absorb: M) -> Vec<(K, V)>
where
    K: Clone + Ord,
    P: FnMut((&K, &V), (&K, &V)) -> bool,
    M: FnMut((&K, &mut V), (&K, &mut V)),
{
    // For simplicity, the vec here is local, but in prod' it will probably be passed as an argument to be reused multiple times.
    // We can still reserve at least as many number of elements as in the map.
    let mut merged = Vec::with_capacity(map.len());

    let mut it = map.into_iter();
    if let Some(e) = it.next() {
        merged.push(e)
    } else {
        return merged;
    }

    let mut last = merged.last_mut().expect("Just inserted");

    for mut e in it {
        if predicate((&last.0, &last.1), (&e.0, &e.1)) {
            absorb((&last.0, &mut last.1), (&e.0, &mut e.1));
        } else {
            merged.push(e);
            last = merged.last_mut().expect("Just inserted");
        }
    }
    merged
}

This solution looks rather optimal and I don't think I'd ever been able to reach it without your help and your solution, so thanks again!

While you cannot mutate the B-tree connectivity during mutable iteration, you can mutate B-tree's values in place. So as long as you have some way to mark a value as having been absorbed, you can remove all marked values with map.retain() after the fact. Yes, this is 2-pass (one for all the absorbs and one for retain), but the increased cache coherency of B-tree and doing all removes in bulk might make up for it in speed compared to a BST. Where a marking strategy does work, this should perform better than allocating intermediate data structures and copying, though I haven't run any benchmarks.

Edit: I realize that you already alluded to this strategy in response to @geebee22's suggestion to use retain. I suppose this answer just spells out the details.

The core loop resembles your C++ version:

/// For this strategy:
/// 1. `absorb` consumes the second value in-place and marks it for removal.
/// 2. `marked_for_deletion` identifies those (key, value)s that were
///    previously marked for removal by `absorb`.
fn merge<K, V, P, M, D>(
    map: &mut BTreeMap<K, V>,
    mut predicate: P,
    mut absorb: M,
    mut marked_for_deletion: D)
    where
        K: Ord,
        P: FnMut((&K, &mut V), (&K, &mut V)) -> bool,
        M: FnMut((&K, &mut V), (&K, &mut V)),
        D: FnMut(&K, &mut V) -> bool,
{
    let mut iter = map.iter_mut();
    let (mut pk, mut pv) = match iter.next() {
        Some(pair) => pair,
        None => return,
    };
    
    // I kept (key, value) as separate variables instead of as a single
    // tuple to avoid having to manually write mutable reborrows like
    // `(p.0, &mut p.1)` all over the place since `&mut` isn't `Copy`.
    for (k, v) in iter {
        if predicate((pk, pv), (k, v)) {
            absorb((pk, pv), (k, v));
        } else {
            // Use the nifty `(pk, pv) = (k, v)` once rust 1.59 is out!
            pk = k;
            pv = v;
        }
    }
    
    map.retain(|k, v| !marked_for_deletion(k, v));
}

Playground with an example where an empty String value implies deletion.

1 Like

I just discovered that my benchmark is flawed because it doesn't re-initialize the data between calling the two functions. I'm not sure how much that affects the results; anyway I'm glad you found a substitute algorithm.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.