I just considered the case where the element size is 9 bytes ( this would be a bit unusual, as typically there will be something with alignment of at least 2 or 4, but it is certainly possible ).
Cap 4 => 36 bytes. 64 bytes allocated, 28 bytes spare, enough for 3 extra, a total of 7.
Cap 8 = 72 bytes, 128 bytes allocated, 50 bytes spare, enough for 4 extra, a total of 12.
Vec will allocate 4 initially, and 8 when 5th element is pushed. It will allocate a 3rd time when 9th element is pushed.
VecA will allocate 4 initially, but gets 7.
It will increase the allocation when the 8th element is pushed.
Currently it asks for 7 * 2 = 14, which is a "mistake" as only 12 are available in 128 bytes.
So a better idea is to ask for less, say 1.5 times the current capacity, in this case 10, it will get 12.
Then it will allocate a 3rd time when 13th element is pushed.
I am considering amending the code to use 1.5 as the "capacity multiplier" rather than 2.
[ Edit: oh, the above is all nonsense, my arithmetic is wrong, 128 / 9 = 14 ! There is no"mistake, and never can be, doubling the capacity will never require MORE than twice the current allocation. ]
[ Edit 2: I think the idea of using a 1.5 times multiplier is still reasonable, if the allocator has smaller size class jumps, this will use them. This defers that decision to the allocator. If it doesn't, the result is the same as using 2. Still, I think I will leave it at 2. ]