Memory management: Which arena-based allocation to use?

I'm at a point where my Rust based renderer matches the C++ counterpart pixel by pixel in a lot of cases (example scenes). See my description of the problem I try to solve:

https://www.rs-pbrt.org/blog/arena-based-allocation

The short version: The C++ code uses arena-based allocation (as described in the PBRT book). The current Rust implementation does not. My main problem so far was that I do not want to deal with references and lifetimes all the time. The description of the problem seems relative easy, but keep in mind that the arena based allocation is used in many places. In my description I focused mainly on BxDFs, which get created and destroyed by many threads in a relative short time. I think the main idea of the C++ code is to make those allocation/de-allocations fast by reusing the allocated memory (blocks). Create the arena per thread (or per bucket), allocate different sized structs/classes a couple of times without caring about releasing the memory exactly at the point where those are not needed anymore. Release the memory once via "arena.Reset()" and reuse the memory without re-allocation.

I investigated how I could use a crate like bumpalo, but the main problem is that Bump::alloc() returns a "&mut T" reference, which means I have to use lifetimes nearly everywhere. This is very painful and I really would appreciate another solution. Any suggestions? There seem to be many arena crates out there ...

2 Likes

I am not sure a really great solution exists, but some possible options are:

  • arena with a lifetime
  • arena with indexes (basically, type Arena<T> = Vec<T>, and you use usize rather than &'_ T to represent a reference). In this solution, you don't have a lifetime, but you'll need to pass-in the arena for both allocations and usages.
  • object pool: you store a thread-local Vec<T> and manipulate T via Handler<T>, which pick a free T from the list on creation and put it back on destruction. Note that this is not exactly the same as arena, because you still have a no no-op Drop of Handle

I've also noticed that the reason why C++ version needs allocation here (if I am not misreading the code) is that dynamic dispatch is used, and you sort-of have to heap-allocate dynamically dispatched things. Replacing dynamic dispatch with an enum might help to keep everything on the stack.

Otherwise, I'd probably go with an index-based arena approach.

3 Likes

If you want to be a bit safer and avoid the ABA problem (While it has it's origins in multi-threading, it does also show up with using a Vec<_>/index pair), you can use the generational-arena crate

2 Likes

I think the problem is that then the arena would contain only elements of the same type. But the C++ code handles many different structs/classes in the same memory block. Or did I misinterprete the docs?

1 Like

No, that is a drawback. You could wrap it in a type-map, but that is additional complexity.

Objects belong to to the arena (they're invalidated when the arena is reset or freed), so they do have a lifetime tied to the arena.

C++ version of the arena does return references with lifetimes. C++ just doesn't have a syntax to express it :slight_smile:

So I suggest giving it a try. Note that you don't need to change nested objects struct Foo {bar: Bar} to references struct Foo<'nope> {bar: &'nope Bar}, so the impact should be minimal, limited to cases where you currently use Box.

2 Likes

Thanks, that is a good observation and worth to try. I started doing this for just a couple of implementors of the Fresnel trait so see if it would compile and what kind of changes that will introduce, but commit e59038a is a start and I will try to do this for e.g. the Bxdf trait (and others). Let's see how the render times (hopefully) go down and if I can get away without an arena-based allocation ...

1 Like

I'm curious as to how your results are so far?

Not too good so far. I kind of know where to go, but I have to take small steps which still need a lot of changes in many files until they compile again. I'm moving more and more stuff from the heap to the stack, and use enum instead of trait, but the speedup will only show once I was able to deal with stuff like SurfaceInteraction (the slightly better version currently looks like this). Hopefully at the end of the week there will be some speed improvements, but it's a long way to go ... Nevertheless, using heaptrack gives some inside of the changes made ... Hopefully for the better :sweat_smile:

Currently I just avoid the arena based allocation and hope that only C++ really needs it. But I could be wrong ...

1 Like

Don't forget there are other allocators to try for rust as well (it used to come built in with another but it changed to the system malloc a while back).

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.