Needlessly weak promises on vector capacity?

ExpHP · February 22, 2019, 9:42pm

Continuing the discussion from [Solved] In place (128*N)-elem Vec<f32> into N-elem Vec<Vec<f32> of 128 size>:

Can somebody explain to me why methods of Vec like shrink_to_fit are so reluctant to provide guarantees about capacity when it is clearly always possible to produce a Vec where len == cap?

sfackler · February 22, 2019, 10:05pm

The language in the shrink_to_fit docs is just saying that even if you call shrink_to_fit on a vec of 5 u32s, the underlying allocation may not be exactly 20 bytes. For example, with jemalloc, it'll round up to 32 bytes.

It's not really a super meaningful distinction for user code, IMO.

ExpHP · February 22, 2019, 10:33pm

Shrinks the capacity of the vector as much as possible.

It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.

As I read this, "it" refers to "the capacity of the vector," and thus the rest of the sentence is saying that vec.capacity() >= vec.len(). This should be rephrased so that it is clearly talking about the size of the backing allocation.

The language of reserve_exact is also misleading:

Reserves the minimum capacity for exactly additional more elements to be inserted in the given Vec<T>. After calling reserve_exact , capacity will be greater than or equal to self.len() + additional. Does nothing if the capacity is already sufficient.

Note that the allocator may give the collection more space than it requests. Therefore capacity can not be relied upon to be precisely minimal. Prefer reserve if future insertions are expected.

This one says that the cap field of the vector may not be equal to len, which is indeed a fair statement after reviewing the implementation. But the second paragraph seems to be implying that this is due to the will of the allocator; that is not the case here! (the cases where cap > requested on exit are simply the cases where cap > requested on entry!)

vitalyd · February 23, 2019, 3:32am

I believe it is up to the allocator. If you ask the allocator for space for N more elements, it may return more than that (ie the whole notion of “usable_size”), eg allocators using size classes.

It doesn’t look like RawVec queries the Alloc::usable_size anywhere (at least I couldn’t find it while on my phone), however.

Yandros · February 23, 2019, 11:58am

So, if I have understood correctly (please correct me if I'm wrong, I am just basing this post on how I have understood this thread), to summarise in practice:

shrink_to_fit, reserve_exact, etc. do manage to drop as much extra capacity as possible. By that I mean that it will be the same capacity as a fresh Vec::with_capacity. That is, user-wise, there is no way to allocate less.
the only subtlety here is thus that a .shrink_to_fit cannot be used to guarantee that calling .push right after will trigger a reallocation: since the allocator may not be able to give 1-byte-precise "chunks", there is always the possibility of the actual capacity being rounded up to some size (e.g., Vec::with_capacity(30) may actually allocate 32 bytes thus allowing 2 extra .pushes before triggering a reallocation).

All in all, those seem like overly cautious "warnings", since the "wasted" capacity only depends from the allocator implementation, and in general should be negligible. (I have to admit that for a moment I had a doubt (like @ExpHP it seems) as to whether Vec::from(v.into_boxed_slice()) occupied less memory than { v.shrink_to_fit(); v })

TomP · February 23, 2019, 3:43pm

I believe that the above is a correct analysis. Allocators always have their own strategy for how they partition storage into useful sizes. Very seldom is the allocated area exactly the same size as that requested; usually it is only expected to be within some factor 𝜀 (e.g., 1.25 or 1.5) of the requested size. Allocators with power-of-2 granularity are very common, particularly for small requested sizes.

ExpHP · February 24, 2019, 1:09am

Can we please stop using the phrase "actual capacity" to refer to the size of the allocation? The term is ambiguous and can be interpreted to mean "the actual value of the capacity field." This in turn creates unnecessary confusion for people who need to write unsafe code that messes with the capacity field.

the only subtlety here is thus that a .shrink_to_fit cannot be used to guarantee that calling .push right after will trigger a reallocation: since the allocator may not be able to give 1-byte-precise “chunks”, there is always the possibility of the actual capacity being rounded up to some size ( e.g. , Vec::with_capacity(30) may actually allocate 32 bytes thus allowing 2 extra .push es before triggering a reallocation).

I believe you are right that pushing to a vector where len == cap does not necessarily reallocate, but you're off in where, how, and when this occurs. The implementation of Vec does not know or care what the actual size of the allocated area is; it only cares about the capacity field.

Vec::<u8>::with_capacity(30) creates a vector where len == 0, cap == 30. Always. No matter what.
When push is called 31 times on such a vector, without conferring with the allocator, the implementation of Vec decides that the new cap shall be 60.
Vec blindly tells the allocator to reallocate the buffer to support a capacity of 60.
Supposing that the allocated space is already large enough to support 60 elements (because e.g. the allocator rounds up to powers of 4), <A as Alloc>::realloc is free to simply return the same pointer after doing nothing; but this is all opaque to the implementation of Vec.

Yandros · February 24, 2019, 9:55am

Thank you @ExpHP, now it's crystal clear (the expression actual capacity did show that things weren't that clear in that regard. I forgot that vec is an abstraction built on top of the allocator abstraction) ... but I am back to square one: if .shrink_to_fit() sets the capacity to the length (asking a truncating realloc to the Allocator, who may return a pointer to a chunk whose usable size is nevertheless bigger than the new requested capacity), then { v.shrink_to_fit; v } and Vec::from(v.into_boxed_slice()) should indeed be "equivalent". Or at least the documentation warnings should be removed.

Topic		Replies	Views
Reserve truly exact capacity for Vec	19	531	April 18, 2024
Why does Vec::from_raw_parts require same size and not same size*capacity help	22	1602	June 17, 2022
Is Vec::with_capacity like Vec::new with Vec::reserve or Vec::new with Vec::reserve_exact? help	12	938	November 22, 2022
`Vec::with_capacity` does not painc as the documentation says help	10	1550	May 19, 2022
Using most of a computers memory with a single Vec of unknown size help	3	375	January 12, 2023

Needlessly weak promises on vector capacity?

Related Topics