Best practices for large stack allocated data that will end up on heap?

I'm building a btree structure that uses ArrayVec to store leaf items. I'm getting better performance then when using standard Vec. But I'm wondering if there's lots of copying going on that I could avoid.

The code below is a simplified example of what I'm doing. It seems like without compiler optimizations the items: &[u8] data will get copied 4 times:

  1. Copied into ArrayVec in build_items_array.
  2. Copied when ArrayVec is returned from build_items_array
  3. Copied when Node returned from build_node
  4. Copied when placed in heap with Arc::new(node)

Is all that copying (especially 1, 2, 3) actually happening, or does it get optimized away somehow?

I expect at least case 4 must be getting copied... is there anyway to avoid?

It seems like ideally there should just be one malloc for Node on heap. And one copy from original item slice into that Node'a items ArrayVec on heap... is that possible?

use std::sync::Arc;
use arrayvec::ArrayVec;

struct Tree {
    node: Arc<Node>,

struct Node {
    items: ArrayVec<[u8; 1024]>,

fn build_tree(items: &[u8]) -> Tree {
    let node = build_node(items);
    Tree {
        node: Arc::new(node)

fn build_node(items: &[u8]) -> Node {
    let items = build_items_array(items);
    Node {

fn build_items_array(items: &[u8]) -> ArrayVec<[u8; 1024]> {
    let mut vec = ArrayVec::new();
    let _ = vec.try_extend_from_slice(items);


cargo asm can help you discover what copying is or isn't being done.

The copyless crate make copy elision much more likely to happen by separating out the allocation of a container from the moving of a value into it. It's not geared up to work directly with Arcs, but you can implement the same basic idea if you need to.


Thanks for the pointers.

It also looks like Arc has some nightly features that are designed to help with this such as new_uninit. I guess that will be a good solution one day.