How to create large objects directly in heap

This might be a silly question. The Docs state that " In Rust, you can allocate memory on the heap with the Box<T> type.". But does the value get there?

My code ran into a stack overflow which essentially boiled down to the following problem:

fn main() {
    let boxed = Box::new([1f64; 1 << 24]);
    println!("len: {}", boxed.len());
}

This gives a stack overflow because the array (placeholder for a large struct) will be created on the stack upfront. I guess it then would try to (physically) move the slice of memory into the heap.

Background: I have a big tree-like structure that will be de-serialized from a byte stream. Some nodes are rather big (> 100 KiB). While building that tree those structures pile up in the stack until everything is done and moved to the stack as one big chunk.

I thought it would be enough to wrap the root node in a Box but due to the aforementioned behaviour the intermediate state exceeds the stack limit.

How can I enforce that all my structs lie in the heap directly from the start without having to introduce a box for every node?

2 Likes

Maybe related: https://github.com/rust-lang/rust/issues/53827

Right now, you can’t. This feature is called “placement new” and has been held up over the last few years on a number of questions.

3 Likes

Does that mean I cannot create a struct that is bigger than the stack allows? And it will always be created on the stack first and then copied to the heap?

As steveklabnik said, you can't but you can use a vec like in this comment.

2 Likes

Semantically, yes.

Sometimes the optimizer will put things directly on the heap, but that’s not guaranteed, as you’ve seen with your code.

1 Like

The array is just a placeholder for any big data structure that has a fixed size.

Thank you for your quick reply. I guess the main discussion was here: https://github.com/rust-lang/rust/issues/27779
Ok ... I'll have to find a workaround since the code is generated from an XML scheme which prohibits manual fine-tuning for individual structs.

1 Like

Is there a way to inspect the stack's content during runtime? I haven't found an appropriate command in rust-lldb, yet. It would greatly help me with my optimizations.

Sorry I didn't read with enough attention, maybe you can use this and do something with it. It's not efficient and really unsafe, for exemple if the data read from the stream are not valid you're in UB land, and probably a lot of other reasons but it enables you to make big structs on the heap.

You can directly alloc memory for the struct, initializing through the pointer yourself, but you'll need your own Box-like wrapper for automatic drop and dealloc.

Or you could use Vec::with_capacity(1), get the raw pointer and initialize it, then set_len(1) and into_boxed_slice(). I think it's OK to convert that Box<[T]> with length 1 to Box<T>, for instance with Box::into_raw and from_raw, but it's worth double-checking that.

Either way, you have to be careful about partial initialization if anything panics. The easiest thing to do is set it up so any partial data will be forgotten. You can dealloc, but don't drop partially initialized data.

6 Likes

You could go even more raw by taking advantage of the fact that Box has recently been specified to be interchangeable with raw allocations provided by the global allocator. Of course you'd need to more carefully initialize the memory with alloc and ptr::write instead of alloc_zeroed if all zeroes is not a valid bitpattern for your actual struct, but the general idea is the same.

I also suppose that's technically only guaranteed in nightly right now but it realistically works on stable too.

7 Likes

I'm glad that was finally guaranteed! FWIW that commit is on the beta branch too, headed for 1.34.

1 Like

Maybe you can use copyless.

1 Like

Any idea how this prevents the compiler from allocating Foo::Small(4) on the stack in the first place?

fn foo() -> Box<Foo> {
    Box::new(Foo::Small(4)) // this has 1 memcopy
    //Box::alloc().init(Foo::Small(4)) // this has 0 memcopies
}

It is taking advantage of some optimizations. So it only works in release mode.