Taking control of Vec resize logic

gohbgl · February 23, 2020, 2:29am

Following up on: Implementing Python-style dict with hashbrown...

I have a sparsely populated Vec<Option<(K,V)>> where None is a tombstone. When the vector gets resized a new block of memory may get allocated and the old contents are copied over. Is there a way for me to take control of the resize logic to skip the tombstones during copying?

Related to this question is the source code for RawVec::resize_internal which handles the resizing. I took a long look at the source code but I could not find the point where the old contents gets copied into the newly allocated memory. Did I miss something?

As always thanks for your help.

mbrubeck · February 23, 2020, 3:16am

The copying is handled by the call to realloc, which may either resize the allocation in-place, or create a new allocation and copy the contents into it.

I don't think there's a generic way that you can hook into Vec’s resize logic, but you could create your own push-like function with custom resizing logic, and call that instead of the standard method (and do the same for any other methods you need to call that involve resizing).

gohbgl · February 23, 2020, 3:37am

Thanks for the clarification on the copying. Then I'll have to wait until Vec exposes the allocator.

H2CO3 · February 23, 2020, 7:57am

Wouldn't just checking the capacity work? As far as I know, the vector is guaranteed not to reallocate if capacity >= newsize, so I'd imagine something like this:

trait VecExt<T> {
    fn push_no_tombstone(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<Option<T>> {
    fn push_no_tombstone(&mut self, item: T) {
        if self.len() < self.capacity() {
            self.push(Some(item));
        } else {
            let tmp: Self = self.drain(..)
                .filter(Option::is_some)
                .chain(Some(Some(item)))
                .collect();
            self.clear();
            self.extend(tmp);
        }
    }
}

marcianx · February 23, 2020, 1:36pm

Could remove the tombstones in-place without a temporary allocation:

trait VecExt<T> {
    fn push_no_tombstone(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<Option<T>> {
    fn push_no_tombstone(&mut self, item: T) {
        if self.len() == self.capacity() {
            self.retain(Option::is_some);
        }
        self.push(Some(item));
    }
}

H2CO3 · February 23, 2020, 2:13pm

Indeed. I don't know how I missed that.

gohbgl · February 23, 2020, 8:58pm

Before pushing a new (K, V) pair onto the vector I check if there are more tombstones than pairs and whether the vector would grow with with the next push. If both are true then I pack the entries which removes the tombstones and prevents the vector from growing.

pub fn insert(&mut self, k: K, v: V) -> Option<V> {
    // ...
    
    let more_tombstones_than_pairs = self.tombstone_len() >  self.len();
    if self.vec_is_full() && more_tombstones_than_pairs {
        self.pack_entries();
    }

    self.entries.push(Some((k, v)));
    let index = self.entries.len() - 1;

    // Insert index into hash table ...
}

But sometimes I want to grow the vector while it still contains tombstones. And for that I would need to write a function similar to this:

trait VecExt<T> {
    fn my_push(&mut self, item: T);
}

impl<T> VecExt<T> for Vec<T> {
    fn my_push(&mut self, item: T) {
        if self.len() == self.capacity() {
            let new_cap = self.capacity() * 2;
            // There is no get_alloc on vec.
            let alloc: std::alloc::AllocRef = self.get_alloc();
            // Use alloc to grow_in_place or fallback to alloc and copy.
            // Tombstones will get skipped during copying.
            let new_ptr = ...;
            self.ptr = new_ptr;
            self.cap = new_cap;
        }
        // Finish pushing ...
    }
}

Unfortunately writing this function right now is impossible without relying on the unstable internals of Vec.

marcianx · February 24, 2020, 2:09am

~~Sorry, perhaps there's something I don't quite understand. Vec already has this functionality built into it. More specifically, do you need to grow by a specific size you have in mind?~~

~~If not, you can just do self.push(item); and vec will grow its capacity once it's reached.~~

If yes (e.g. you'd like to exactly double the capacity), then you can do so via vec.reserve():
if self.len() == self.capacity() { // Reserves an _additional_ `self.capacity()` amount. self.reserve(self.capacity()); }

Edit: Oh, I see that you'd like reserve() but without copying the tombstones. Then, yes, you'd do something similar to what @H2CO3 suggested:

if self.len() < self.capacity() {
    let tmp = Vec::with_capacity(self.capacity() * 2);
    tmp.extend(self.iter().filter(Option::is_some));
    mem::swap(self, &mut tmp);
}

gohbgl · February 24, 2020, 2:29pm

@marcianx

By default Vec uses realloc to grow its internal buffer and I want to keep using it. The code you wrote is pretty much what I will use instead of the current copy implementation in realloc. But maybe I'll skip the tombstone filtering if the tombstones only make up a very small percentage of all entries. I'll have to write some benchmarks for that.

system · May 24, 2020, 2:39pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
&mut Vec<T> that can only modify elements, but not push/pop elements	4	375	July 22, 2020
How to set a Vec to a subslice of itself? help	6	548	October 28, 2019
Fast initialization of Vec<Option<Vec<T>>>	24	1070	September 22, 2021
Rustlings, move_semantics2.rs, mutability, ownership code review	3	1783	March 27, 2022
Vec of fixed capacity ( StackVec ) code review	6	133	April 12, 2024

Taking control of Vec resize logic

Related Topics