Using an Arena without passing it around everywhere

The project I'm working on centers around a data structure that's essentially a tree (or a DAG, really): the nodes are enums and they hold "references" to other nodes, acyclically. Currently these references are Rcs, because a subtree may appear in many trees and I'm happy for them to be immutable.

I've considered instead using an Arena and bare references, but there's a huge syntax tax I'd like to avoid: various parts of the code currently manipulate these trees (i.e., pluck a few branches off of this tree, and stitch them together into a new tree), and I can't think of a way to use an Arena without polluting all the interfaces in my code base so that they can allocate new nodes, should they need to.

Really, this is a specific instance of Dependency Injection, and my normal response to that is "suck it up and pass your dependencies around", but for something as fundamental as memory management, it seems needed. Imagine if every function had to pass a "malloc" function around...

Is there a pattern that can be used here?

1 Like

The pattern in is very interesting, but it has quite limited applicability - the arena/pool needs to be a named singleton which may not at all fit the use case.

1 Like

So, the comparison with malloc (or global allocator) is apt, but it's important to understand why you don't need to pass the global allocator around - Firstly, the user can't safely destroy it and secondly, an allocation from it doesn't imply a borrow on the allocator. The first property allows the second. You could implement this yourself, but it would require unsafe.

I wrote an example of an alternative approach using thread_local and generational-arena. It does not require any unsafe code. generational-arena is a special type of arena that, essentially, defers borrow analysis to runtime. When you allocate a node, it returns a handle to a value called Index that you can later use to retrieve a temporary, concrete borrow.

Note that I used thread_local here, but if you need to be able to share the arena across threads, then lazy_static should work, too (replace RefCell with Mutex). Additionally, if you want to be able to swap in-and-out different arenas, you could instead have it be something like RefCell<Option<Arena<T>>>, which would let you provide the "implicit" arena only temporarily.

The main upside of the generational-index approach: It satisfies the requirement to not have to pass the arena around.

Unfortunately, it has some downsides: The code is a bit ugly. You lose static borrow checking AND static type checking. And you have to manually deallocate values from the arena using remove, though this may not be a problem if deallocation of nodes can be deferred to destruction of the arena without having to worry about memory footprint.

This is a bit of a fundamental problem if you want a global arena - all allocations from the arena imply a borrow on the arena. And there's no way to statically assert to the compiler that the arena will live longer than all of its allocations without using unsafe. If you're willing to write unsafe code, then it becomes a bit easier, but you have to be careful - when you erase the borrow on the arena, it is now up to you to personally validate that 1) the arena will never invalidate the borrow and 2) the arena will never be deallocated.

I'll leave it to you on which of these three approaches is best:

  1. Just pass a typed-arena style of arena (and its lifetime) around. You then have bare references, but pay the tax of passing it around.
  2. Use a global arena with lifetime-erased allocation handles like generational-arena.
  3. Get out unsafe and start doing developer-time borrow checking.

I've had good success with thread_local and a SlotMap arena.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.