BTreeMap::with_capacity

Is there a reason why BTreeMap::with_capicity does not exist?

2 Likes

Yes. It's a tree structure, not a single big allocation.

2 Likes

I did not expect that such a simple reply would answer my question :sweat_smile:,

but I am not sure if everyone who stumbles across this question will be able to follow, so for future reference I want to give a bit of a more thorough explanation:

In order to understand the difference between a Vec and a tree-like structure, one must first understand how they are represented in memory.
Memory itself is often grouped into two parts:

  • the stack, which is embedded into the executable and consists of data of which the size is already known at compile time (for example a pointer is on a 64bit operating system, 8 bytes large or an u8 is 1 byte large)
  • and the heap, where data can be allocated with an arbitrary size (for example reading a file into memory or taking user input).

A Vec is a data structure that allocates data on the heap at runtime. The size must not be known at compile-time. A Vec is declared like this (slightly simplified):

struct Vec<T> {
   // pointer to the start of the buffer
   buf: *mut T,
   // length of the buffer
   len: usize,
}

both fields (buf and len) have a size, which is known at compile-time so the struct itself is allocated on the stack and only the data will be allocated on the heap.

The heap can be seen as a huge number of blocks, where each block is a single byte.


(in the pictures, the bright blue will be free memory, red will be allocated by our program and other colors are from other programs).

A Vec would simply allocate a chunk on the heap

let mut vec = Vec::new();
// reserve 5 * mem::size_of::<T>() on the heap,
// where T is the element of a Vec<T>
vec.reserve_exact(5);

but our Vec is not the only thing that will allocate on the heap so the memory might become more fragmented:

Because of the dark blue blocks (memory from another program) our buffer can only grow 2 more additional bytes, otherwise our program would access memory from another program, which is undefined behavior, but what happens if we still try to allocate more than those 2 bytes?

let mut vec = Vec::new();
// reserve 5 * mem::size_of::<T>() on the heap,
// where T is the element of a Vec<T>
vec.reserve_exact(5);

// resize the buffer
vec.reserve_exact(5);

the allocator will first copy the buffer to a new place in the memory


and then reserve the needed bytes

Those copies might slow down the whole program, which can be prevented if enough memory is allocated beforehand and that is why the capacity/reserve-methods exist. (by the way amortization is the Vec allocating more memory, than requested to prevent reallocations)


A BTreeMap or a tree in general is structured a bit differently as it consists of many small pointers to different allocations on the heap:

struct Tree<T> {
    elements: Vec<*mut T>,
}

in its simplest form it contains only a list of pointers to the stored elements.

So instead of a single large chunk, that might need to reallocate, it consists of many small chunks.


(the red chunks are different elements from the tree, bright blue is free memory and all other colors are from different programs and yes I know that all red chunks should have the same size, but I do not want to edit this picture again).

So a tree stores pointers to different allocations, which makes insertions a lot more efficient, as it does not require to reallocate the entire buffer (which could involve copies).


To make the difference even clearer consider the following program:

let mut btree = BTreeSet::new();

for i in 0..3 {
   // each call will allocate a new chunk in the heap
   btree.insert(format!("hello_{}", i));
}

// the btree will have pointers to
// "hello_0".to_string()
// "hello_1".to_string()
// "hello_2".to_string()

let mut vec = Vec::new();

for i in 0..3 {
    // each push will resize the buffer
    vec.push(format!("hello_{}", i));
    // the allocator might need to copy the entire buffer to a different
    // position in the heap, if something else is limiting
    // the number of free blocks.
}


To summarize it:

A BTreeMap does not provide methods to reserve memory, because it does not have to reallocate the entire buffer, like a Vec needs to.

PS: I did create those pictures for the heap and everyone is free to use them for whatever they want.

10 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.