Understanding which of these is the best in terms of memory usage (stack/heap?)

ckcr4lyf · November 30, 2024, 8:54am

Hello guys. I am trying to understand memory usage and stack vs. heap when it comes to performance a bit better. I think I'm aware of the general rule: stack is always faster than heap.

For the particular task I am trying to solve, I have a function that returns a 6-byte buffer, and I need to gather 50 such elements and return them. There are four functions I came up with:

// assume f() is some DB call or whatever

fn example0(f: &dyn Fn() -> [u8; 6]) -> Vec<Vec<u8>> {
    let mut result: Vec<Vec<u8>> = vec![];
    for i in 0..50 {
        let data: [u8; 6] = f();
        result.push(data.to_vec());
    }
    result
}

fn example1(f: &dyn Fn() -> [u8; 6]) -> Vec<Vec<u8>> {
    let mut result: Vec<Vec<u8>> = Vec::with_capacity(50);
    for i in 0..50 {
        let data: [u8; 6] = f();
        result.push(data.to_vec());
    }
    result
}

fn example2(f: &dyn Fn() -> [u8; 6]) -> Vec<[u8; 6]> {
    let mut result: Vec<[u8; 6]> = Vec::with_capacity(50);
    for i in 0..50 {
        let data: [u8; 6] = f();
        result.push(data);
    }
    result
}

fn example3(f: &dyn Fn() -> [u8; 6]) -> [[u8; 6]; 50] {
    let mut result: [[u8; 6]; 50] = [[0; 6]; 50];
    for i in 0..50 {
        let data: [u8; 6] = f();
        result[i] = data;
    }
    result
}

I am pretty sure example0 is the worst, since no size information is known, and a lot of allocations would have to be made.

example1 is probably a bit better since the outer vec knows it has a size of 50, but the inner elements are still not known.

example2 should be pretty decent, since it "knows" that it will have 50 [u8; 6]

Finally, example3 knows exactly how many bytes are needed, so it should be the most efficient.

Am I correct in my understanding? What are the main differences between 2 & 3, considering in both cases all sizes are known at compile time?

luca3s · November 30, 2024, 9:12am

I think you are mostly correct in your understanding but there a couple more factors except for knowledge of size.
For example in example 0 and 1 you have a Vec<Vec>. This is relatively slow because you load a pointer from the outer vec and then load the T from that pointer. This is bad because your CPU caches won't be utilized very well.
As for examples 2: Yes the sizes are known at compile time, but for Vec::with_capacity it doesn't make a difference if the value is known at compile time or runtime. The difference to a program without that line is that the Vec doesn't need to reallocate while your loop is running, because it has all the needed space from the beginning, but it still has to allocate once.
That is also the difference to example 3. Here no allocation happens, so you just do less, which is always faster. That is also a big reason why the stack is faster. You don't need to call the allocator.

tczajka · November 30, 2024, 9:16am

example0 has about 55 heap allocations: 50 for the little vectors, ~5 for result as it grows dynamically.
example1 has 51 heap allocations: 50 for the little vectors, 1 for result
example2 has 1 heap allocation for result
example3 has 0 heap allocations. It is the only example where the data is on the stack.

example3 can also be written in a slightly cleaner way by using ArrayVec.

Having data on the stack is not necessarily more efficient than having it on the heap. So it's not clear that example3 is more efficient than example2, it depends on what happens with the data later. When keeping the data on the stack you may have to move it between stack frames of different functions or into some other data structures which is a performance hit. So typically it's more convenient to allocate relatively large data like that on the heap, like in example2. Heap allocation is not that slow.

StefanSalewski · November 30, 2024, 9:23am

I would have answered it in simple words like this:

An array is just a block of storage on the stack, this is also true for a nested array where the elements are arrays again. That compact block fits good in the Cache.
So your example3 should be optimal.

Instead, a vector introduces an indirection: A struct on the stack with 3 fields, length, capacity, and the pointer to the actual data on the heap. Vector needs to allocate its buffer, and free when not used any longer. And all the vectors buffers might be distributed in RAM, so cache efficiency might be bad.

ckcr4lyf · November 30, 2024, 9:51am

Thanks all for the responses.

example0 has about 55 heap allocations: 50 for the little vectors, ~5 for result as it grows dynamically.

Pretty interesting, so with_capacity() is defenitely useful if the size is known beforehand.

When keeping the data on the stack you may have to move it between stack frames of different functions or into some other data structures which is a performance hit.

That makes sense, since instead of passing around 300 bytes, it's just a pointer. I guess this would also be beneficial when there are a lot of concurrent requests being served?

ZiCog · November 30, 2024, 11:45am

This is not true. An "array" is a collection of objects, all the same size, laid out consecutively in memory.

An array in Rust can live on the stack or the heap. Or even in static global space. Consider the following:

static A: [i32; 5] = [1, 2, 3, 4, 5]; // Static global

fn f() {
    let a1 = [1, 2, 3, 4]; // On the stack
    let a2 = Box::new([5, 6, 7, 8]); // On the heap
    println!("{}, {}", a1[1], a2[2]);
}

StefanSalewski · November 30, 2024, 12:06pm

Yes, you are right for the Box case. What is even more important, my statement "A struct on the stack with 3 fields" is also not always true, as this struct can reside on the heap, e.g. for the cases that we have a vector containing vector elements.

ZiCog · November 30, 2024, 2:02pm

To be clear you are describing a Vec. A Vec is a struct (not an array). You are right, like any other struct a Vec can live on the stack or on the heap. Consider:

    let v1 = Vec::<u32>::new(); // Vec on the stack
    let v2 = Box::new(Vec::<u32>::new()); // Vec on the heap
    println!("{}, {}", v1[1], v2[2]);

Now, the Vec of course has some data content, in an array on the heap. No matter where the Vec struct is.

kornel · November 30, 2024, 11:17pm

You can use cargo asm to see what code they generate:

system · February 28, 2025, 11:18pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Which would have better performance, Heapless's Vector, Or an a Array?	5	3055	August 16, 2022
Allocate on stack or heap and pass to fn x() code review	6	519	February 15, 2023
Difference between array vs. vec for memory and cpu usage help	7	11348	January 12, 2023
Why isn't Rust even faster than C++? Comparing heap vs stack allocation	26	5942	August 11, 2020
Efficient memory usage in Rust help	11	2381	December 6, 2021

Understanding which of these is the best in terms of memory usage (stack/heap?)

Related topics