When to use arrays and when to box trees?

quadrupleslap · January 30, 2019, 6:25pm

When writing trees there are two main ways people generally write them.

// Option 1: boxing things.

struct Node<T> {
    data: T,
    left: Option<Box<Self>>,
    right: Option<Box<Self>>,
}

struct Tree<T> {
    root: T,
}

// Option 2: arrays.

struct Node<T> {
    data: T,
    left: usize, // usize::MAX for none.
    right: usize,
}

struct Tree<T> {
    nodes: Slab<Node<T>>,
    root: usize,
}

Would the performance ever be significantly different between the two implementations?

Advantages of Boxing

It's simpler.
It works with the borrow checker (you can mutably borrow both children of a node at once).
There are less operations involved when accessing a node. With an array, you have to bounds-check in the array and offset the pointer. Worse, with a slab, you'll have to make sure that it is currently occupied. With a boxed node, you just dereference it.

Advantages of Arrays

If you don't have billions of nodes (and you probably don't), you can use smaller pointers, which can save lots of space, especially for wider trees, which in turn improves cache-friendliness.
Maybe the nodes being near each other in memory will magically improve access times (it probably won't).
Iterating the nodes is really easy, if you ever want to do that for some reason.

OptimisticPeach · January 30, 2019, 6:37pm

Just a side note, wouldn't the second one have to be

struct Node<T> {
    data: T,
    left: usize,
    right: usize,
}
struct Tree<T> {
    nodes: Slab<Node<T>>, //take note here
    root: usize,
}

quadrupleslap · January 30, 2019, 6:37pm

Yeah, whoops!

mbrubeck · January 30, 2019, 6:40pm

Another advantage of using arrays + indices is that a node can easily store a "pointer" to its parent.

(The equivalent for boxes requires Arc/Rc and Weak.)

OptimisticPeach · January 30, 2019, 6:43pm

Also, I've frequently heard that a

Node
-> T
-> Left Content
-> Right Content

Is slower due to caching instead of an array (Which can be cached on the cpu):

Node
-> T
-> Left Index
-> Right Index

notriddle · January 30, 2019, 7:13pm

Are there any invariants in your tree?

For example, if you can guarantee that your tree will always be dense and equal depth, then you can represent it without storing any pointers or indices at all, and just have i's children at the fixed position 2i+1 and 2i+2.

quadrupleslap · January 30, 2019, 7:45pm

Yeah, that'd be convenient if it was a complete binary tree, but binary heaps are literally the only time I've ever seen those be useful.

vitalyd · January 31, 2019, 1:53am

What are going to be the access patterns that you want to optimize for? What types do you envision using for T? Namely, are they going to be laid out inline or do they have their own inner heap accesses?

Keep in mind that a slab (or whatever pool/allocator you use) gives you the possible flexibility of arranging storage to meet your access patterns. A Box will not since the default allocator has no idea whether this Box is related to another Box (beyond any size classes it may use internally), and so you can easily end up with very scattered heap access (i.e. random access, cache misses galore).

A slab/arena can also facilitate fast teardown where the entire space is reclaimed at once, rather than individual Box deallocs. Not sure if you care about dealloc performance.

Topic		Replies	Views
When would you want to use a boxed array?	7	9097	November 5, 2020
Looking for the best idiomatic way to work on trees as fast as possible	6	759	November 1, 2021
Using Box to build a tree help	2	247	January 16, 2024
Box<[T; N]> vs Box<[T]> help	3	317	January 18, 2023
Best way to implement a tree help	4	2643	January 12, 2023

When to use arrays and when to box trees?

Advantages of Boxing

Advantages of Arrays

Related Topics