I'm wondering why collection algorithms like vec copy values from the old allocated memory space to the new allocated memory space when they grow past their original capacity instead of keeping the values in the original space and linking to the new allocated space.
I understand that it's quicker to pull items from memory spaces that are next to each other, but it would seem like that extra cost of copying everything might negate the benefit if the collection is added to more frequently than it is read from.
Cache hit is the king of the performance on modern multicore multilevel-cache CPUs. That's why the vector beats the linked list at performance on almost every use cases, including to push and pop at front(no kidding, it's the fastest op in linked list and the slowest op in vector).
But as always, don't believe human guessing on performance! Performance is very sensitive topic so you should only believes benchmarks. Note that setting up proper benchmark code is hard, you should use existing benchmark crates like ::criterion.
I wasn't thinking full linked list, only linking to the newly allocated slow. Say you start the vec off with a capacity of 100 items, and when you reach that capacity you allocate new memory for 200 items, but instead of copying 100 items over to the new memory you would keep the first 100 where they are and then have a pointer to the start of the newly allocated area. When you iterate over the vec you would iterate over the first 100 items, and then start iterating at the start of the next allocated memory area. You would have to keep track of each allocated memory area and how many items were in each one. Would something like that still create missed cache hits?
I'm still ignorant of how the cache works for a cpu, or at least the mechanism for how the cpu decides what is kept within the cache and for how long. Is this something the application or OS askes for?
No. Most modern computer systems have multi-level caches, with the smallest, most-rapidly-accessed cache integrated with the processor core, a second-level cache on the same multi-core die, and additional caches between the multi-core die(s) and the main memory. Each successive cache is larger than the prior one, with longer access delays and larger memory lines that are transferred between the adjacent caches.
With rare exceptions, which don't apply to normal Rust programs, cache residency is determined solely by access patterns: the most recently accessed item, and adjacent memory within the same cache line, are held in the cache until displaced by something else. A first-level cache, closest to the processor, might have a line size of 32 B, while the 2nd level cache might have a line size of 256 B and take considerably longer to access. Items that are adjacent in memory tend to be in the same cache line, meaning that they are already available to the processor immediately when it needs them, without a long access delay.
Returning to your original query, Rust's Vec type is a growable/shrinkable array. All arrays in Rust have a fixed stride, which means that the memory address offset (distance) from one item to the next is constant. Your suggestion is for a split multi-segment array, which would make the array stride non-constant. CPU addressing and cache structures are optimized to work with fixed-stride arrays, which is why they are so prevalent.
I think, fundamentally, the reason vectors don't work that way is because tautologically that's not what a vector is. Vectors are basically just dynamically sized arrays aka contiguous memory. What you're describing is an unrolled linked list which certainly are useful at times.
One thing to keep in mind is that modern CPUs love contiguous memory and predictable patterns. Vectors fit both of those requirements very well while still being conceptually simple to understand (a useful quality in a common collection type).
Since re-allocation is the slow part of vectors, it's typically highly optimized. Vectors don't resize every time you call insert, they typically allocate extra space so subsequent insertions can be handled without re-allocating and usually they vary the extra space requested as the size of the vector increases. A common pattern is to allocate 2 * current capacity on re-allocation.
All that said, one of the biggest and simplest optimizations you can do when using vectors is to size them correctly at initialization time if you know how big they're going to be. In Rust, you can do this by calling Vec::with_capacity(size) instead of Vec::new().
I have seen a project written in Rust use an unrolled linked list. The main reason they did this was because they were using a bump allocator. Bump allocators are really fast but have the problem of not being able to deallocate memory. They didn't use a vector because it would have been really memory inefficient with a bump allocator. To get better performance from the unrolled linked list, the size of each node was 2 * size of last node. This greatly reduces the number of nodes in the unrolled linked list while also not wasting much memory. The combination of a bump allocator and an unrolled linked list might be more performant in some situations but an easier solution would be @wesleywiser's suggestion of starting with a Vec with a large capacity. Bump allocators are not particularly fun to use so I would avoid them unless you really really need the performance and your problem really suits them.
@jeramyRR some explanation of cache, and linked-list and contiguous vector is explained in this hyperbole article (read it for what it is - a hyperbole) that I wrote many years ago. Especially the comments on the article are interesting as both valid praise and valid rebuttals can be found
I feel like I’m seeing a lot of people in this thread misunderstand your question.
In my mind, you’re asking “why not just allocate new memory right next to the existing vector so that you don’t have to copy the old elements?”. Basically, allocate new memory and then merge it with the old memory.
I’ve had this same exact thought/question. I remember reading somewhere that the reason is that memory allocators just don’t work that way. They (de)allocate chunks of memory at a time, and don’t expand/merge them. Allocate once at a fixed size, and then deallocate that same exact chunk.
This calls AllocRef::grow, which will extend the memory block if possible, or if not, allocate a new block and deallocate the old one.
I think this is undocumented in Vec mostly because from the perspective of a user, it's an internal performance optimization. The API of Vec doesn't change at all because of this optimization, and the overall strategy remains the same - double the size, (possibly) reallocating, every time more space is needed.
Note that the exponential growth is important. On each reallocation, we have to copy twice as much, but then it takes twice as long to fill up again. These factors cancel out and we get (on average) constant-time .push().
That's really cool! And it's a good reminder that "contiguous" heap allocations like Vec are actually series of disjoint pages. This means that Vec growth (in some cases) already works close to how the original poster suggested: "keeping the original values in place, and linking to the new allocated space."
And doing this in the allocator via memory-mapping means that all programs and libraries that call realloc benefit from it automatically, and that it interacts well with CPU caches and prefetching.