Taking control of Vec resize logic

Following up on: Implementing Python-style dict with hashbrown...

I have a sparsely populated Vec<Option<(K,V)>> where None is a tombstone. When the vector gets resized a new block of memory may get allocated and the old contents are copied over. Is there a way for me to take control of the resize logic to skip the tombstones during copying?

Related to this question is the source code for RawVec::resize_internal which handles the resizing. I took a long look at the source code but I could not find the point where the old contents gets copied into the newly allocated memory. Did I miss something?

As always thanks for your help.

The copying is handled by the call to realloc, which may either resize the allocation in-place, or create a new allocation and copy the contents into it.

I don't think there's a generic way that you can hook into Vec’s resize logic, but you could create your own push-like function with custom resizing logic, and call that instead of the standard method (and do the same for any other methods you need to call that involve resizing).

1 Like

Thanks for the clarification on the copying. Then I'll have to wait until Vec exposes the allocator.

Wouldn't just checking the capacity work? As far as I know, the vector is guaranteed not to reallocate if capacity >= newsize, so I'd imagine something like this:

trait VecExt<T> {
    fn push_no_tombstone(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<Option<T>> {
    fn push_no_tombstone(&mut self, item: T) {
        if self.len() < self.capacity() {
            self.push(Some(item));
        } else {
            let tmp: Self = self.drain(..)
                .filter(Option::is_some)
                .chain(Some(Some(item)))
                .collect();
            self.clear();
            self.extend(tmp);
        }
    }
}
1 Like

Could remove the tombstones in-place without a temporary allocation:

trait VecExt<T> {
    fn push_no_tombstone(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<Option<T>> {
    fn push_no_tombstone(&mut self, item: T) {
        if self.len() == self.capacity() {
            self.retain(Option::is_some);
        }
        self.push(Some(item));
    }
}
2 Likes

Indeed. I don't know how I missed that.

Before pushing a new (K, V) pair onto the vector I check if there are more tombstones than pairs and whether the vector would grow with with the next push. If both are true then I pack the entries which removes the tombstones and prevents the vector from growing.

pub fn insert(&mut self, k: K, v: V) -> Option<V> {
    // ...
    
    let more_tombstones_than_pairs = self.tombstone_len() >  self.len();
    if self.vec_is_full() && more_tombstones_than_pairs {
        self.pack_entries();
    }

    self.entries.push(Some((k, v)));
    let index = self.entries.len() - 1;

    // Insert index into hash table ...
}

But sometimes I want to grow the vector while it still contains tombstones. And for that I would need to write a function similar to this:

trait VecExt<T> {
    fn my_push(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<T> {
    fn my_push(&mut self, item: T) {
        if self.len() == self.capacity() {
            let new_cap = self.capacity() * 2;
            // There is no get_alloc on vec.
            let alloc: std::alloc::AllocRef = self.get_alloc();
            // Use alloc to grow_in_place or fallback to alloc and copy.
            // Tombstones will get skipped during copying.
            let new_ptr = ...;
            self.ptr = new_ptr;
            self.cap = new_cap;
        }
        // Finish pushing ...
    }
}

Unfortunately writing this function right now is impossible without relying on the unstable internals of Vec.

Sorry, perhaps there's something I don't quite understand. Vec already has this functionality built into it. More specifically, do you need to grow by a specific size you have in mind?

  1. If not, you can just do self.push(item); and vec will grow its capacity once it's reached.
  2. If yes (e.g. you'd like to exactly double the capacity), then you can do so via vec.reserve():
    if self.len() == self.capacity() {
      // Reserves an _additional_ `self.capacity()` amount.
      self.reserve(self.capacity());
    }
    

Edit: Oh, I see that you'd like reserve() but without copying the tombstones. Then, yes, you'd do something similar to what @H2CO3 suggested:

if self.len() < self.capacity() {
    let tmp = Vec::with_capacity(self.capacity() * 2);
    tmp.extend(self.iter().filter(Option::is_some));
    mem::swap(self, &mut tmp);
}

@marcianx

By default Vec uses realloc to grow its internal buffer and I want to keep using it. The code you wrote is pretty much what I will use instead of the current copy implementation in realloc. But maybe I'll skip the tombstone filtering if the tombstones only make up a very small percentage of all entries. I'll have to write some benchmarks for that.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.