I’m building a simple arena allocator. It’s basically just a wrapper around the main allocator combined with a technique to avoid frequent new allocations
This allocator isn’t meant to replace the main allocator, it’s designed to work alongside it.
It handles the parts that fit its use case best, situations with many heap allocations that have roughly the same lifetime and a defined memory usage limit
Here are the arena specifications I want to achieve:
- Single threaded usage (each thread will have their own arena. This one is also chosen to avoid thread synchronization overhead, so it’s more like a specialized arena. Then I plan to create a thread safe version as a separate arena in the next project)
- No individual item deallocation support (all items are dropped at once when the arena itself is dropped)
- Support for reusing the arena via a reset operation
So far, my code looks like this :
#![feature(allocator_api)]
pub struct Arena {
val: std::ptr::NonNull<u8>,
handle: std::cell::UnsafeCell<*mut u8>,
end: *mut u8,
layout: std::alloc::Layout
}
impl Arena {
#[inline(always)]
pub fn new(size: usize) -> Self {
unsafe {
let layout = std::alloc::Layout::from_size_align_unchecked(size, std::mem::align_of::<usize>());
let start = std::alloc::alloc(layout);
if start.is_null() {
std::alloc::handle_alloc_error(layout);
}
Self {
val: std::ptr::NonNull::new_unchecked(start),
handle: start.into(),
end: start.add(size),
layout: layout
}
}
}
#[inline(always)]
fn inner_alloc<T>(&self, val: T) -> *mut T {
unsafe {
let handle = self.handle.get();
let align = std::mem::align_of::<T>();
//let size = std::mem::size_of::<T>();
let handle_addr = (*handle) as usize;
let aligned_addr = (handle_addr + (align - 1)) & !(align - 1);
let ptr = aligned_addr as *mut T;
let next = ptr.add(1) as *mut u8;
debug_assert!(next <= self.end, "arena is out of memory");
ptr.write(val);
*handle = next;
ptr
}
}
#[inline(always)]
pub fn alloc<T>(&self, val: T) -> &T {
let ptr = self.inner_alloc(val);
unsafe { & *ptr }
}
#[inline(always)]
pub fn alloc_mut<T>(&self, val: T) -> &mut T {
let ptr = self.inner_alloc(val);
unsafe { &mut *ptr }
}
#[inline(always)]
pub fn reset(&self) {
let handle = self.handle.get();
unsafe { *handle = self.val.as_ptr(); }
}
}
impl Drop for Arena {
#[inline(always)]
fn drop(&mut self) {
unsafe {
std::alloc::dealloc(self.val.as_ptr(), self.layout);
}
}
}
unsafe impl std::alloc::Allocator for &Arena {
#[inline(always)]
fn allocate(&self, layout: std::alloc::Layout) -> Result<std::ptr::NonNull<[u8]>, std::alloc::AllocError> {
unsafe {
let handle = self.handle.get();
let size = layout.size();
let align = layout.align();
let handle_addr = (*handle) as usize;
let aligned_addr = (handle_addr + (align - 1)) & !(align - 1);
let ptr = aligned_addr as *mut u8;
let next = ptr.add(size);
debug_assert!(next <= self.end, "arena is out of memory");
*handle = next;
let offset = aligned_addr - self.val.as_ptr() as usize;
let ptr = self.val.as_ptr().add(offset);
return Ok(std::ptr::NonNull::new_unchecked(std::slice::from_raw_parts_mut(ptr, size)));
}
}
#[inline(always)]
unsafe fn deallocate(&self, _ptr: std::ptr::NonNull<u8>, _layout: std::alloc::Layout) {
}
}
fn main() {
let mut arena = Arena::new(40);
let a = arena.alloc("a");
println!("{}", *a);
let mut v = Vec::with_capacity_in(1, &arena);
v.push(100);
v.push(100);
println!("{}, {}", v[0], v[1]);
let mut w = Vec::with_capacity_in(1, &arena);
w.push(10);
println!("{}", w[0]);
arena.reset();
let mut w = Vec::with_capacity_in(1, &arena);
w.push(10);
println!("{}", w[0]);
/* allocate more than the capacity will panic in debug build
let mut w = Vec::with_capacity_in(100, &arena);
w.push(10);
println!("{}", w[0]);
*/
}
I'm still learning unsafe Rust. This is a project I’m writing to learn more about it.
I’d appreciate any code suggestions, whether related to security or performance or design.
Once it’s correct, I’ll publish it and use it in my project. Arena allocator is also faster than a general purpose allocator, like in this simple benchmark on Rust Playground :
Arena allocator version :
Default general allocator version :
The arena allocator consistently runs at around 200–300 ns, while the default general allocator takes about 600–700 ns (the scale is always around x2)
This difference will becomes even more noticeable when there are many allocations or when used inside a hot loop. That’s also one of my motivations for choosing arena allocator as the topic for the project
I also hope this will encourage the Rust community to develop well established specialized allocators that work better for specific use cases and can be used alongside the main allocator
It’s been more than a day and I still haven’t gotten any response :<(
I really need feedbacks because I really want to continue but don't know what to do :>)