Vec map / collect / capacity

Does

  let my_new_vector: Vec<foo> my_vector.iter().map(..elided..).collect();

preallocate the length of my_new_vector?

If not, I suppose I could write it out manually using a for loop

There is a proposal ([Tracking Issue for `Iterator::collect_into` · Issue #94780 · rust-lang/rust · GitHub]) that
seems to address this (using method 'collect_into'), but it's not currently in stable.

I am confused since there was a bug in the last year or so roughly about
iter/map/collect reusing the vector but not cleaning up excess space, which
implies some sort of reuse (note 1). I have done some googling but have
not found chapter-and-verse that answers this (fair enough, if collect_into
becomes stable, that's the 'best answer' for what I am trying to do).

Note 1: I don't know if I want reuse or not. Situation is build a vector of
preparation elements, then build the final result in the map/collect.

I didn't check the source code, but I believe the FromIterator implementation for Vec would surely check the Iterator::size_hint(), and iter::Map respects the size_hint() of the inner iterator.

what's more, I think the actual FromIterator implementation uses specialization, so it's probably speicalized for ExactSizeIterator.

so in conclusion, I believe it definitely will preallocate in this case.

It specializes for TrustedLen, not ExactSizeIterator, but this iterator implements it as well.

1 Like

btw, I'm not aware what bug you are talking about.

whether nor not the container would pre-allocate should be a performance optimization, it should not change the correctness of user code. so I don't know what's your concerns are.

as for reusing exisitng allocation, yes, Vec does have special treatment to reuse exisiting storage special cases, notably, code like this will do the transformation in-place, no allocation/deallocaton/reallocation would occur:

let mut xs: Vec<i32> = vec![1, 2, 3];
xs = xs.into_iter().map(|x| x+1).collect(); //<-- no reallocation happens here

but an existing Vec reusing its allocation and a new Vec reserving storage (pre-allocating) are different problems. in your example, there are no reusing because the old my_vector stil exists. maybe you meant to use my_vec.into_iter() (as opposed to my_vec.iter())?

maybe related previouis discussions:

Yes. map is 1:1, so the size_hint that collect sees is the same as the size_hint from the vector iterator, and the vector iterator's size_hint is always perfectly accurate.

Thanks for this!

This answers my question... I do have another one...

All of the answers (including yours) involve references to the source code. Is this something that is answered in the API documentation?

kind of, you need to piece all the information together to get the full picture, but some of the pieces are implied.

the most important part is documented in a section titled allocation behavior at impl FromIterator for Vec, to quote:

Note: This section covers implementation details and is therefore exempt from stability guarantees.

Vec may use any or none of the following strategies, depending on the supplied iterator:

  • preallocate based on [Iterator::size_hint()](std - Rust> iter/trait.Iterator.html#method.size_hint)
    • and panic if the number of items is outside the provided lower/upper bounds
  • use an amortized growth strategy similar to pushing one item at a time
  • perform the iteration in-place on the original allocation backing the iterator

and examples of implied knowledge includes (not exhaustive):

  • Iterator::collect::<Collection>() calls <Collection as FromIterator>::from_iter().

    this is not stated verbatim in the documentation, but it should be obvious based on the trait bound.

  • Iterator::map() preserves size_hint().

    again, this is not stated literally, but the documentation says (emphasis is mine):

    it produces a new iterator which calls this closure on each element of the original iterator.

  • Iterator::size_hint() is a hint, "buggy" implementations should not cause memory safety problem.

  • assume the standard library implements iterators correctly.

  • etc.

so conclusion? the problem as a whole may or may not be well-documented, depending on how familiar you are with the standard library.

that said, pre-allocation should be seen as an optimization, it's good if it is documented, but every implementation optimization is not required to be documented, and the correctness of your program's behavior should not rely on such optimizations.

Not necessarily directly-and-obviously.

But see the description on https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint. The main raison d'être of that iterator method is to provide the hint to collect so that it can pre-allocate the correct amount.

That said, for the specific case of vec.iter().map(…).collect(), you can tell from the ExactSizeIterator in std::iter - Rust implementations. The slice iterator (which you get from Vec::iter) is ESI https://doc.rust-lang.org/std/slice/struct.Iter.html#impl-ExactSizeIterator-for-Iter<'_,+T> and Map<_> forwards ESI https://doc.rust-lang.org/std/iter/struct.Map.html#impl-ExactSizeIterator-for-Map<I,+F>.