Lessons from how Vec is implemented

I'm about to start a project that will process large amounts of data. I'm thinking about the design of my data structure. I suspect it's going to work a great deal with the polars DataFrame.

Before jumping into the implementation, I feel as though I have not sufficiently understood the potential "lessons learned" from how std::vec implemented Vec. Perhaps there is no "take-home" message because it is truly unique.

I'm curious though, to anyone that has done a "deep dive" into the design of Vec, what did the review have you consider in your own data structure designs?

1 Like

I believe that the important things to keep in mind are as follows:

  1. Properly handling uninitialized memory.
  2. Zero-sized types require special care - the allocator does not support them.
  3. The Vec type should be covariant, so it must use a *const T rather than *mut T.
5 Likes

There’s this section of the rustonomicon entitled Implementing Vec.

3 Likes

One thing about Vec is that it's so old that I think it would be implemented differently if written today. The most important lesson, I think, is to separate your responsibilities well.

You want to narrow its responsibility as much as you can, leaving the other parts to things that are better at it. That's not always obvious what it means. But for Vec, what it's fundamentally doing is keeping track of which items are initialized. Everything else -- notably the memory allocation and deallocation -- you can push off to a different type. In the current library (and nomicon, IIRC) that's the RawVec type. But now we have MaybeUninit, so today I'd have written Vec as a Box<[MaybeUninit<T>]> for the buffer, where the Box code handles the allocations and deallocations, and a usize for the initialized length. Then, for example, with_capacity is Box::new_uninit_slice with len: 0 -- the Vec itself doesn't have to do anything interesting. And in Vec::drop you'd need to drop len items, but no more -- the Box's drop would handle deallocation.

See replace RawVec with Boxed slice by conradludgate · Pull Request #94421 · rust-lang/rust · GitHub for an attempt to actually use this approach in the real Vec implementation.

19 Likes

"should be covariant". are you talking about Vec as a type constructor?

Yes.

Does Vec use *mut T? or are you saying, an important part of the design is to avoid using *mut T which would make Vec contravariant?

... where the different type is Box. What does this design change provide as a benefit? Are there limits to what the allocator can track now vs if it were hosted in the Box type? (Other than the inherit design benefit of improved encapsulation)

The general change of making the deallocation the responsibility of a field's type?

It's that Rust will drop the fields for you, even if you have your own Drop::drop.

And thus you don't have to remember to deallocate things in the right places. For example, when you're implementing extend, you don't have a bunch of Copy pointers, you have a real non-Copy type. So if you allocate the new one and something panics, that new one will get deallocated properly. And if you want to keep it, you mem::swap it into your field, and now the old one will get deallocated without you doing anything.

So it helps because it lets you take advantage of built-in Rust features (like move checking) rather than having to think about everything yourself.

2 Likes

What patterns remain useful in how Vec was implemented? For instance, how would you articulate the benefit of having Vec and RawVec (or Box<[MaybeUninit<T>]>)?

1 Like

Big win. Got it.

In this updated version of Vec, what would be the remaining motivation to "write your own Drop::drop"?

Because the new implementation of Vec still needs to drop the initialized elements. The Box-of-MaybeUninit will deallocate the memory, but it won't drop any of the Ts because it doesn't know which ones were initialized.

So in this version, you'd have something like this:

impl<T> Drop for Vec<T> {
    fn drop(&mut self) {
        unsafe {
            let slice = self.buffer.get_unchecked_mut(..self.len);
            ptr::drop_in_place(MaybeUninit::slice_assume_init_mut(slice));
        }
    }
}

Because that's the core of what Vec specifically needs to care about: ..len are initialized, so drop those.

2 Likes

This is helpful.

...Vec tracks what parts of the allocated memory are safe to access (initialized parts of the allocated memory). Consistent with this responsibility,

Vec does not allocate nor deallocate memory.

It seems like the new design does a better job of aligning the way Rust executes the dropping and deallocation tasks (lifetime Vec is shorter than Box<[MaybeUninit<T>]>), in-line with dropping of initialized items must happen before we deallocate the whole memory block (where both initialized and unitialized memory exists).

Do you mean deallocate the memory no matter whether a custom drop is implemented for item T? (when we say "memory leak", isn't it always about failing to deallocate memory?)

T is not responsible for deallocation of the memory it is placed in - it doesn't even know where it is. Vec (or, in the updated approach, Box) is.

Of course — if you dynamically allocate memory, you should deallocate it after use, because it is the allocation itself that leaks if it doesn't go away once not needed any more. This is completely independent of whether T has a destructor. If you allocate a gigabyte of trivial integers, and leave them hanging, that's still a leak, isn't it?

Sure, but you still need to std::ptr::drop_in_place() any initialized elements in the Vec otherwise their destructors will be leaked.

1 Like

Not sure if this is related but I tried to implement vec in two bytes, inspired by beef. GitHub - pickfire/ve: Faster, more compact implementation of std: :Vec

But at the end (without any extra effort on optimizations), it turns out to be slower than std vec.

If you want to build a custom vec, you probably can just use RawVec and build it on top of it.

Note, I did not read rustonomicon, so the implementation may be incorrect.