Guarantee that Vec::default() does not allocate?

Documentation of impl<T> Default for Vec<T> reads like this:

Creates an empty Vec<T>.

Being empty means the Vec's length is zero, but it doesn't say anything about its capacity, right?

With the current implementation, there will be no allocation, as can be seen in the implementation, which uses Vec::new. But I guess this is, strictly speaking, not guaranteed, right?

Maybe the API specification could/should be extended in that matter? I'm asking because I use std::mem::take on a &mut Vec<_> and I wouldn't want an allocation to happen.

Second question: I rely on Vec::default().capacity() == 0. I assume this is also not guaranteed, even if it seems to be true in the current implementation? (Playground) Adding such a guarantee to std seems to be less trivial though, I guess, because future implementations might handle Vecs with small lengths differently (i.e. without allocating), or am I wrong?

I found codes like this:

    /// Like `new`, but parameterized over the choice of allocator for
    /// the returned `RawVec`.
    pub const fn new_in(alloc: A) -> Self {
        // `cap: 0` means "unallocated". zero-sized types are ignored.
        Self { ptr: Unique::dangling(), cap: 0, alloc }
    }

new_in is called by new, so I think 0 capacity is guaranteed.

If you're only interested in this guarantee for performance reasons (i. e. your use-case wouldn't misbehave if there was any allocation involved), I believe it's fine to rely on it without further explicit guarantees because a change here would be considered a performance regression in the standard library.

1 Like

Vec::new being const fn might already constitute a guarantee that there's no run-time allocation involved... well, at least if you assign it to a const value first (since compile-time dynamic allocations might eventually be a thing).

3 Likes

Yeah, the current implementation always has a capacity of zero. But I wonder if it's guranteed that this stays the same in future versions of std.

I think in my case, I would create a memory leak. This is my use-case:

#[derive(Debug)]
pub struct BufWriteGuard<T> {
    buffer: Vec<T>,
    recycler: Option<mpsc::UnboundedSender<Vec<T>>>,
}

impl<T> BufWriteGuard<T> {
    fn new(buffer: Vec<T>, recycler: mpsc::UnboundedSender<Vec<T>>) -> Self {
        BufWriteGuard {
            buffer,
            recycler: Some(recycler),
        }
    }
    pub fn finalize(mut self) -> BufReadGuard<T> {
        BufReadGuard::new(take(&mut self.buffer), self.recycler.take().unwrap())
    }
}

impl<T> Drop for BufWriteGuard<T> {
    fn drop(&mut self) {
        if self.buffer.capacity() > 0 {
            let _ = self.recycler.as_ref().unwrap().send(take(&mut self.buffer));
        }
    }
}

I rely on the drop handler to not "recycle" a default Vec.

Edit: The memory leak occurs because I will "recycle" the Vec and another default Vec, which makes the mpsc grow indefinitely here:

pub struct BufPool<T> {
    recycler: mpsc::UnboundedSender<Vec<T>>,
    dispenser: mpsc::UnboundedReceiver<Vec<T>>,
}

impl<T> BufPool<T> {
    pub fn new() -> Self {
        let (recycler, dispenser) = mpsc::unbounded_channel::<Vec<T>>();
        Self {
            recycler,
            dispenser,
        }
    }
    pub fn get(&mut self) -> BufWriteGuard<T> {
        let buffer = match self.dispenser.try_recv() {
            Ok(mut buffer) => {
                buffer.clear();
                buffer
            }
            Err(_) => {
                println!("Create new buffer");
                Vec::new()
            }
        };
        BufWriteGuard::new(buffer, self.recycler.clone())
    }
}

I'm not sure if I can follow your reasoning. Does the standard library guarantee no future changes will cause any program to run slower, independently on how the standard library is used?

I don't immediate see where exactly the leak would happen, perhaps you can point to it in the code example?

I extended my code sample above.


Also see full Playground.

Changes in performance are a trade-off; relying (for performance) on empty Vec freation) creation to be cheap and allocation-free is super common, I would assume, and also I can't imagine any benefit in changing this, so such a change will never happen as it's a terrible trade-off.

There's no guarantee as strong as you asked for ("independently of how the standard library is used"), but as this is a common and natural use-case, not anything weird or so, it should be covered by the general principle that the standard library should contain good and performant code.

3 Likes

Would it make sense to add these (sane) assumptions to std's API specification?

I understand I can likely rely on it, but usually feel safer if I don't make assumptions that aren't explicitly guranteed… (if this could result in memory leaks, for example).

I do support the idea of documenting properly that Vec::new or Vec::default won't allocate. I'm just trying to justify why it's even a reasonably thing to assume (for performance reasons) right now without the explicit guarantee. Even with a guarantee that no allocation is involved, you technically wouldn't have a guarantee that e. g. mem::forgetting a Vec::new or Vec::default wouldn't leak anything else. That's entirely hypothetical, but creating a Vec might as well modify some other global state that's supposed to be cleaned up when dropped. (Except it probably can't because it's a const fn, but my point was about "this doesn't allocate" possibly also still being vague/imprecise on certain points or use-cases.)

Hmmm… :thinking: Now that I think about it, it's less the "allocation" that bothers me, but more that I rely on the capacity being zero. Perhaps the "clean" way would be to use an Option<Vec> instead of a Vec in my use case, but I refrained from doing that due to the overhead of unwrapping (in safe Rust).

Even in this full playground, I still don't see where you assume a leak could happen if empty vecs did allocate. Would you mind explaining this better?


Oh.. with your latest comment in mind, are the capacity checks relevant?

If that's a problem, you do have a problem because if T is a zero sized type, IIRC, a Vec::default will have capacity usize::MAX or something like that.

Edit: Yup, I did recall correctly.

1 Like

Presumably, for zero-sized types, you want to skip any recycling logic altogether, anyway :sweat_smile:

:scream:

I slightly modified the example because there was yet a panic when capacity wasn't zero. This demonstrates the problem of memory leak now: Playground.

Not sure what you mean? I "misuse" the capacity method to "detect" default Vecs that were left by std::mem::take.

I guess I'll have to use an Option<Vec> to be clean and take from the &mut Option<Vec> instead of from the &mut Vec. But I don't like the unwrap overhead at runtime… :slightly_frowning_face:


Nonetheless, I think Vec::default should give the same guarantees as Vec::new.

You can always std::mem::replace(&mut v, Vec::with_capacity(0)) instead, which explicitly spells out the conditions you rely on.

5 Likes

Nice idea, but:

Constructs a new, empty Vec<T> with at least the specified capacity.

with_capacity can return a Vec with higher capacity, as also demonstrated by @steffahn:

fn main() {
    assert_eq!(Vec::<()>::with_capacity(0).capacity(), usize::MAX)
}

(Playground)

Even if this is an exotic ZST case, the documentation explicitly says "at least".

It also states explicitly that "If capacity is 0, the vector will not allocate." So, aside from ZSTs, a capacity request of 0 will be honored exactly.

That said, using an enum to provide your sentinel value instead of an implementation detail of Vec seems like the best approach from a maintainability perspective.

2 Likes

But couldn't a Vec keep a small number of elements on the stack instead of the heap? (I know it doesn't, but a future version could hypothetically, right?) Thus it could have a capacity > 0 even without allocating.

Yes, I think it's best to do that. I'd use an Option instead of an enum though. Something like this: Playground. (sorry, wrong link, ugh! Here is the right one: Playground)

Unless there is another way to move out of self when Self: Drop?

I guess I could do this:

    pub fn finalize(self) -> BufReadGuard<T> {
        let mut this = std::mem::ManuallyDrop::new(self);
        BufReadGuard::new(take(&mut this.buffer), this.recycler.clone())
    }

(Playground)


But wouldn't this be even nicer?

    pub fn finalize(self) -> BufReadGuard<T> {
        let this = std::mem::ManuallyDrop::new(self);
        BufReadGuard::new(this.buffer, this.recycler)
    }

(Playground)

Unfortunately:

   Compiling playground v0.0.1 (/playground)
error[E0507]: cannot move out of dereference of `ManuallyDrop<BufWriteGuard<T>>`
  --> src/main.rs:54:27
   |
54 |         BufReadGuard::new(this.buffer, this.recycler)
   |                           ^^^^^^^^^^^ move occurs because value has type `Vec<T>`, which does not implement the `Copy` trait

error[E0507]: cannot move out of dereference of `ManuallyDrop<BufWriteGuard<T>>`
  --> src/main.rs:54:40
   |
54 |         BufReadGuard::new(this.buffer, this.recycler)
   |                                        ^^^^^^^^^^^^^ move occurs because value has type `UnboundedSender<Vec<T>>`, which does not implement the `Copy` trait

For more information about this error, try `rustc --explain E0507`.
error: could not compile `playground` due to 2 previous errors

I guess that's because ManuallyDrop is an ordinary wrapper and not some sort of language-built-in feature.

While I agree it should be guaranteed, your example still don't leak. mem::take() replace the value with Default::default(), and that Vec::default() will be dropped after the <BufWriteGuard as Drop>::drop() method returns just like any other fields.

3 Likes