Taking control of Vec resize logic

Following up on: Implementing Python-style dict with hashbrown...

I have a sparsely populated Vec<Option<(K,V)>> where None is a tombstone. When the vector gets resized a new block of memory may get allocated and the old contents are copied over. Is there a way for me to take control of the resize logic to skip the tombstones during copying?

Related to this question is the source code for RawVec::resize_internal which handles the resizing. I took a long look at the source code but I could not find the point where the old contents gets copied into the newly allocated memory. Did I miss something?

As always thanks for your help.

The copying is handled by the call to realloc, which may either resize the allocation in-place, or create a new allocation and copy the contents into it.

I don't think there's a generic way that you can hook into Vec’s resize logic, but you could create your own push-like function with custom resizing logic, and call that instead of the standard method (and do the same for any other methods you need to call that involve resizing).

1 Like

Thanks for the clarification on the copying. Then I'll have to wait until Vec exposes the allocator.

Wouldn't just checking the capacity work? As far as I know, the vector is guaranteed not to reallocate if capacity >= newsize, so I'd imagine something like this:

trait VecExt<T> {
    fn push_no_tombstone(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<Option<T>> {
    fn push_no_tombstone(&mut self, item: T) {
        if self.len() < self.capacity() {
            self.push(Some(item));
        } else {
            let tmp: Self = self.drain(..)
                .filter(Option::is_some)
                .chain(Some(Some(item)))
                .collect();
            self.clear();
            self.extend(tmp);
        }
    }
}
1 Like

Could remove the tombstones in-place without a temporary allocation:

trait VecExt<T> {
    fn push_no_tombstone(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<Option<T>> {
    fn push_no_tombstone(&mut self, item: T) {
        if self.len() == self.capacity() {
            self.retain(Option::is_some);
        }
        self.push(Some(item));
    }
}
2 Likes

Indeed. I don't know how I missed that.

Before pushing a new (K, V) pair onto the vector I check if there are more tombstones than pairs and whether the vector would grow with with the next push. If both are true then I pack the entries which removes the tombstones and prevents the vector from growing.

pub fn insert(&mut self, k: K, v: V) -> Option<V> {
    // ...
    
    let more_tombstones_than_pairs = self.tombstone_len() >  self.len();
    if self.vec_is_full() && more_tombstones_than_pairs {
        self.pack_entries();
    }

    self.entries.push(Some((k, v)));
    let index = self.entries.len() - 1;

    // Insert index into hash table ...
}

But sometimes I want to grow the vector while it still contains tombstones. And for that I would need to write a function similar to this:

trait VecExt<T> {
    fn my_push(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<T> {
    fn my_push(&mut self, item: T) {
        if self.len() == self.capacity() {
            let new_cap = self.capacity() * 2;
            // There is no get_alloc on vec.
            let alloc: std::alloc::AllocRef = self.get_alloc();
            // Use alloc to grow_in_place or fallback to alloc and copy.
            // Tombstones will get skipped during copying.
            let new_ptr = ...;
            self.ptr = new_ptr;
            self.cap = new_cap;
        }
        // Finish pushing ...
    }
}

Unfortunately writing this function right now is impossible without relying on the unstable internals of Vec.

Sorry, perhaps there's something I don't quite understand. Vec already has this functionality built into it. More specifically, do you need to grow by a specific size you have in mind?

  1. If not, you can just do self.push(item); and vec will grow its capacity once it's reached.
  2. If yes (e.g. you'd like to exactly double the capacity), then you can do so via vec.reserve():
    if self.len() == self.capacity() {
      // Reserves an _additional_ `self.capacity()` amount.
      self.reserve(self.capacity());
    }
    

Edit: Oh, I see that you'd like reserve() but without copying the tombstones. Then, yes, you'd do something similar to what @H2CO3 suggested:

if self.len() < self.capacity() {
    let tmp = Vec::with_capacity(self.capacity() * 2);
    tmp.extend(self.iter().filter(Option::is_some));
    mem::swap(self, &mut tmp);
}

@marcianx

By default Vec uses realloc to grow its internal buffer and I want to keep using it. The code you wrote is pretty much what I will use instead of the current copy implementation in realloc. But maybe I'll skip the tombstone filtering if the tombstones only make up a very small percentage of all entries. I'll have to write some benchmarks for that.