Memory usage with Boxes

I've been working on an app and experimenting with different ways to store data. I considered using a Vec<Box<dyn Trait>>, where each Vec has a size of 4096 elements. However, since I plan to have around 140k of these Vecs, I'm trying to minimize memory usage.

To explore the memory implications, I created a simple test setup:

use std::thread;
use std::time::Duration;

fn main() {
    let len = 1024 * 1024 * 250;
    let mut vec = Vec::new();
    for _ in 0..len {
        vec.push(1u16);
    }
    println!("Waiting...");
    thread::sleep(Duration::from_secs(100));
}

This version uses about 523MB of memory, which makes totally sense. However, when I modify the code to wrap the 1u16 in a Box::new(), the memory usage skyrockets to 10.48GB! This is nearly 20 times more memory! I understand that boxing adds some overhead, but I wasn't expecting it to be this significant, especially since I'm not even dealing with custom structs that might increase size due to vtables. In both examples data is stored on heap (due to use of vec)... Do you know why it happens and how I could minimize memory usage with Boxes?

A u16 in a vec takes 2 bytes total. If you box it this does a native memory allocation (malloc) for each u16. The overhead you're seeing is the overhead of a native malloc plus the pointer size (8 bytes) for storing the box (which is just a native pointer) in the vector, for each u16.

2 Likes

Only use them in large quantities when you really need them, especially for very small objects. It takes some experience to know when you need them, but now that you're aware of their overhead you will learn as you go along.

If each Vec is to have elements of the same type, then you should create a Vec of that element type, like the Vec<u16> you started with, and then wrap that whole Vec in a Box<dyn Trait> (of a suitable trait for your application). That will be much more efficient than 4096 small Boxes.

2 Likes

This is even worse as two pointers for a Box<dyn Trait>, one for the heap allocation and one for the vtable.

3 Likes

Okay... That's pretty interesting. I've tested my program with u8, u32, and u64, and they all use nearly the same amount of memory.

So, I have many small structs implementing a trait, each storing just one u16. What’s the best way to store a lot of these structs efficiently? I'm considering storing them as u16 and creating function that maps IDs to the corresponding structs, but I'm not entirely satisfied with this approach.

Also, does heap memory take more space (for mapping where values are)? While storing just u16s I didn't see much overhead so I'm not sure about it

Vec<YourStruct> is the most compact possible storage for it. Write your program to work with different Vecs.

Each individual heap allocation takes more space because it has overhead and there needs to be a pointer to it. (They also get (possibly) scattered about the heap, so accessing them is less efficient.) It's not that you need to avoid the heap, it's that you need to minimize the number of heap allocations you use. A Vec<u16> owns just one heap allocation. A Vec<Box<u16>> of length N owns N + 1 heap allocations.

2 Likes

Each individual allocated object in the heap (for each u16) has overhead, because it has a size and it can be freed individually. A Vec on the other handle is a single heap allocation, in other words, an array of u16 with no overhead per element.

Something like that would be a good idea. I believe this is the flyweight pattern.

Rust's system allocator directly calls malloc, which is documented to always return memory chunks sufficiently aligned for all native types. Typically this means that each allocation is aligned to at least 16 bytes, and is thus at least 16 bytes long, even if you ask for a single byte. Also each allocated chunk has certain system-dependent overhead used for the allocator's internal bookkeeping. Generally you can expect it also to be at least 8 bytes long, possibly longer.

This means that any allocation can be expected to take at least 24 bytes of memory, even if you allocate a simple u16. Box itself is pointer-sized, so it's 8 extra bytes. And you're using boxed trait objects, which means that each object also includes a vtable pointer, so actually each of you Box<dyn Trait> takes 16 bytes, plus the size of allocation.

All of that sums up to at least 40 bytes of memory per trait object, which is the 20x memory overhead that you have noticed.

The conclusion here is that you shouldn't be heavily using boxes if you're creating millions of objects, particularly tiny ones (but even with big allocations that's still gigabytes of pure overhead, plus performance degradation). If you really need millions of objects, you should consider some other, more complex, memory management strategy. Ideally you should avoid allocations and dynamic dispatch entirely: allocate objects of different types in separate vectors, so that you are always working with concrete types, and don't need dynamic dispatch or boxing.

5 Likes

Here's a fun and easy way to check how much heap each allocation probably takes (code and comments written out in detail by ChatGPT), which in this playground indicates 32 bytes! (if I'm not missing any important detail/caveat). That would mean 32 bytes on the heap plus 16 bytes for the Box<dyn Any>, giving 48 bytes total, which would even be a factor of 24.

1 Like

Makes sense. If each use allocated block is aligned to 16 bytes and is 16 bytes in size, and each header is allocated inline with that block, the headers are effectively 16 bytes aligned as well, so each allocation costs at least 32 bytes. But note that at least 8 of those bytes are likely wasted space. For the allocator used on playground, the header seems to be 8 bytes long, since you can bump the allocation size to 24 bytes without changing the total memory consumption of each allocation.

3 Likes