How to use std::alloc::Allocator

I have just made a basic implementation of Box ( after doing BTreeMap, BTreeSet and Vec), for my pstd crate, which has parts of Rust std library ( different implementations, features not yet stabilised etc ), and I was thinking about experimenting with a non-default Allocator.

The question is “what to do”? One idea is to have a special thread-local allocator for types I know will not persist beyond the lifetime of the thread, it can be a simple “bump” allocator which should be fast, and deallocate could just keep a running total and free everything at once. Does that make sense? I think it might work, but the details might be trickier than I am anticipating.

Has anyone experimented with std::alloc::Allocator, and if so was it beneficial?

you mean values ? any allocator can hold 'static types.

overall making a basica implementation of Allocator is pretty simple, making a good one is the hard part.

here is a simple stack based bump allocator i just made

#![feature(allocator_api)]
use std::ptr::NonNull;
use std::alloc::Layout;
use std::alloc::AllocError;
use std::alloc::Allocator;
use std::cell::UnsafeCell;
use std::cell::Cell;


struct MyAllocator<const N : usize> {
    idx : Cell<usize>,
    data : MyAllocChunk<N>
}

#[repr(align(128))]
struct MyAllocChunk<const N : usize>([UnsafeCell<u8>;N]);


impl<const N : usize> MyAllocator<N> {
    fn new() -> Self {
        Self {
            idx : Cell::new(0),
            data : MyAllocChunk([const {UnsafeCell::new(0) };N])
        }
    }
}

unsafe impl<'a, const N : usize> Allocator for &'a MyAllocator<N> {
    fn allocate(&self, l: Layout) -> Result<NonNull<[u8]>, AllocError> { 
        let alignement @ 1..=128 = l.align() else { return Err(AllocError)};
        let Some(start) = self.idx.get().checked_next_multiple_of(alignement) else { return Err(AllocError)};
        let Some(end) = start.checked_add(l.size()).filter(|&n| n <= N) else { return Err(AllocError)};
        self.idx.set(end);
        Ok(unsafe { NonNull::new_unchecked(&raw const self.data.0[start..end] as *mut [u8]) })
        
        
    }
    unsafe fn deallocate(&self, _: NonNull<u8>, _: Layout) { 
        
    }  
}

and a slightly less simple use case based on a previous post, with a linked list and a vec of boxes

i don't know if it is super beneificial to make such a simple thing. making a semi-decent allocator certainly can teach you a lot

1 Like

Well, values of certain types. I don’t want to make dozens/hundreds of changes to the code (allocating from specific allocators) only to find it isn’t even worthwhile. My ideas of what to do (including nothing) are very vague at this point. I was wondering if anyone had tried using Allocator on a large scale. It seems mostly people have just experimented a bit.

I watched some of this: https://www.youtube.com/watch?v=pJ-FRRB5E84

The Impact of Memory Allocators on Performance: A Deep Dive - Arthur Pastel | EuroRust 2024”

Well I think I have a sort-of plan. Make type-aliases something like this for use in appropriate modules:

type Vec = pstd::Vec<T,LOCAL>;
type Box = pstd::Box<T,LOCAL>;
type BTreeMap<K,T> = pstd::BTreeMap<K,T,LOCAL>;

where LOCAL is my special bump allocator. Then I do a global replace
Box::new → newbox
Vec::new → newvec
and implement newbox and newvec appropriately. Maybe this is all wrong, but I will give it a go and see what happens! Well, it does get quite messy, as newbox etc. needs to be public, so it breaks my public API. But what the heck.

[ I ended up using new names for the types, LVec, LBox, etc, as it is easier to gradually migrate parts of the code, it is starting to work. ]

[ Later: it is quite interesting, I realised I may want more than one “local” allocator. There are “very temporary” heap allocations, for example where a parse tree is created but it lives a very short time, “somewhat temporary” heap allocations (which typically last until a thread terminates), and “long term” heap allocations ]

Note that for testing, you can also set the global allocator to your custom one, and all collections will automatically use it. Obviously that isn't what you want for performance when you're making a bump allocator, but you can slap it into any project effortlessly and give it a stress test to make sure it functions properly.

Well, some kind of update, I think it is possible for a bump allocator to be faster than say mimalloc.

A bump allocation + deallocation may be say 10ns versus 20ns.

The trouble is I think it is probably hard to find practical situations where it would make any appreciable difference.

Another minor issue: it doesn’t seem possible to get it to work with miri. I think for similar reasons that mimalloc doesn’t coexist with miri, not a big issue.

1 Like

Hello
i was working on that and have a somehow fine Arena, i don't exactly know how your Scope looks but glady share one implementation to it so you can advance that.

Summary
use core::alloc::{GlobalAlloc, Layout};
use core::arch::asm;
use core::mem::{align_of, size_of};
use core::ptr::{NonNull, addr_of_mut, null_mut};
use spin::Mutex;

use crate::phys::{self, HeapArena};

pub const FALLBACK_HEAP_SIZE: usize = 256 * 1024;

static mut FALLBACK_HEAP: [u8; FALLBACK_HEAP_SIZE] = [0; FALLBACK_HEAP_SIZE];

#[repr(C)]
struct FreeBlock {
    size: usize,
    next: Option<NonNull<FreeBlock>>,
}

#[repr(C)]
#[derive(Copy, Clone)]
struct AllocTag {
    block_start: usize,
    block_size: usize,
}

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum HeapSourceKind {
    Fallback,
    Arena,
}

struct FreeList {
    head: Option<NonNull<FreeBlock>>,
    initialized: bool,
    heap_virt_start: usize,
    heap_len: usize,
    heap_phys_start: usize,
    heap_source: HeapSourceKind,
}

unsafe impl Send for FreeList {}

impl FreeList {
    const fn new() -> Self {
        Self {
            head: None,
            initialized: false,
            heap_virt_start: 0,
            heap_len: 0,
            heap_phys_start: 0,
            heap_source: HeapSourceKind::Fallback,
        }
    }

    unsafe fn init_once(&mut self) {
        if self.initialized {
            return;
        }

        let (heap_start, heap_len) = self.ensure_heap_backing();
        let heap_end = heap_start + heap_len;

        let block_start = align_up(heap_start, align_of::<FreeBlock>());
        if block_start >= heap_end {
            return;
        }

        let size = heap_end - block_start;
        if size < minimum_block_size() {
            return;
        }

        let block = block_start as *mut FreeBlock;
        block.write(FreeBlock { size, next: None });
        self.head = Some(NonNull::new_unchecked(block));
        self.initialized = true;
    }

    unsafe fn alloc(&mut self, layout: Layout) -> *mut u8 {
        if !self.initialized {
            self.init_once();
        }

        let mut current = self.head;
        let mut prev: Option<NonNull<FreeBlock>> = None;

        while let Some(mut block_ptr) = current {
            let block = block_ptr.as_mut();

            let block_start = block as *mut FreeBlock as usize;

            let payload_start = match aligned_payload(block_start, layout) {
                Some(v) => v,
                None => {
                    prev = Some(block_ptr);
                    current = block.next;
                    continue;
                }
            };

            let total_used = match payload_start
                .checked_add(layout.size())
                .and_then(|end| end.checked_sub(block_start))
            {
                Some(v) => v,
                None => {
                    prev = Some(block_ptr);
                    current = block.next;
                    continue;
                }
            };

            // If we split, the next free-list node must be properly aligned for `FreeBlock`.
            // This padding is accounted to the allocated block size.
            let aligned_used = align_up(total_used, align_of::<FreeBlock>());

            if aligned_used > block.size {
                prev = Some(block_ptr);
                current = block.next;
                continue;
            }

            let mut remaining = block.size.saturating_sub(aligned_used);

            let next_block = if remaining >= minimum_block_size() {
                let next_start = block_start + aligned_used;
                let next_ptr = next_start as *mut FreeBlock;
                next_ptr.write(FreeBlock {
                    size: remaining,
                    next: block.next,
                });
                Some(NonNull::new_unchecked(next_ptr))
            } else {
                remaining = 0;
                block.next
            };
            let alloc_block_size = if remaining == 0 {
                block.size
            } else {
                aligned_used
            };
            block.size = alloc_block_size;

            match prev {
                Some(mut p) => p.as_mut().next = next_block,
                None => self.head = next_block,
            }

            let tag_ptr = payload_start - size_of::<AllocTag>();
            (tag_ptr as *mut AllocTag).write(AllocTag {
                block_start,
                block_size: alloc_block_size,
            });

            return payload_start as *mut u8;
        }

        null_mut()
    }

    unsafe fn dealloc(&mut self, ptr: *mut u8) {
        if ptr.is_null() {
            return;
        }

        let tag_ptr = ptr.sub(size_of::<AllocTag>()) as *mut AllocTag;
        let tag = *tag_ptr;
        let block_size = tag.block_size;
        let block_start = tag.block_start;
        let block_ptr = block_start as *mut FreeBlock;
        block_ptr.write(FreeBlock {
            size: block_size,
            next: None,
        });

        let mut prev: Option<NonNull<FreeBlock>> = None;
        let mut current = self.head;

        while let Some(node) = current {
            if (node.as_ptr() as usize) > block_start {
                break;
            }
            prev = current;
            current = node.as_ref().next;
        }

        let mut new_node = NonNull::new_unchecked(block_ptr);
        {
            let new_block = new_node.as_mut();
            new_block.next = current;
        }

        if let Some(mut p) = prev {
            p.as_mut().next = Some(new_node);
        } else {
            self.head = Some(new_node);
        }

        self.try_merge_with_next(new_node);

        if let Some(p) = prev {
            self.try_merge_with_next(p);
        }
    }

    fn install_heap(&mut self, virt_start: usize, phys_start: usize, len: usize) {
        self.heap_virt_start = virt_start;
        self.heap_len = len;
        self.heap_phys_start = phys_start;
        self.heap_source = HeapSourceKind::Arena;
    }

    fn ensure_heap_backing(&mut self) -> (usize, usize) {
        if self.heap_len == 0 {
            let start = addr_of_mut!(FALLBACK_HEAP) as *mut u8 as usize;
            self.heap_virt_start = start;
            self.heap_len = FALLBACK_HEAP_SIZE;
            self.heap_phys_start = 0;
            self.heap_source = HeapSourceKind::Fallback;
        }
        (self.heap_virt_start, self.heap_len)
    }

    unsafe fn try_merge_with_next(&mut self, mut node: NonNull<FreeBlock>) {
        let node_size = node.as_ref().size;
        let node_end = (node.as_ptr() as usize).saturating_add(node_size);

        if let Some(next_ptr) = node.as_ref().next {
            let next_start = next_ptr.as_ptr() as usize;
            if node_end == next_start {
                let next_size = next_ptr.as_ref().size;
                let next_next = next_ptr.as_ref().next;
                let new_size = node_size + next_size;
                let node_mut = node.as_mut();
                node_mut.size = new_size;
                node_mut.next = next_next;
            }
        }
    }
}

struct Allocator;

static ALLOCATOR: Mutex<FreeList> = Mutex::new(FreeList::new());

unsafe impl GlobalAlloc for Allocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        ALLOCATOR.lock().alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, _layout: Layout) {
        ALLOCATOR.lock().dealloc(ptr)
    }
}

#[global_allocator]
static GLOBAL_ALLOCATOR: Allocator = Allocator;

#[derive(Copy, Clone, Debug)]
pub struct HeapStats {
    pub heap_start: usize,
    pub heap_end: usize,
    pub phys_start: usize,
    pub usable_start: usize,
    pub usable_total: usize,
    pub free_bytes: usize,
    pub largest_free_block: usize,
    pub free_blocks: usize,
    pub initialized: bool,
    pub source: HeapSourceKind,
}

pub fn heap_stats() -> HeapStats {
    let mut guard = ALLOCATOR.lock();
    unsafe {
        if !guard.initialized {
            guard.init_once();
        }
    }

    let (heap_start, heap_len) = guard.ensure_heap_backing();
    let heap_end = heap_start.saturating_add(heap_len);
    let usable_start = align_up(heap_start, align_of::<FreeBlock>());
    let usable_total = heap_end.saturating_sub(usable_start);

    let mut free_bytes = 0usize;
    let mut largest_free_block = 0usize;
    let mut free_blocks = 0usize;
    let mut current = guard.head;
    while let Some(block_ptr) = current {
        // Safety: free list nodes are managed by the allocator.
        let block = unsafe { block_ptr.as_ref() };
        free_blocks += 1;
        free_bytes = free_bytes.saturating_add(block.size);
        if block.size > largest_free_block {
            largest_free_block = block.size;
        }
        current = block.next;
    }

    HeapStats {
        heap_start,
        heap_end,
        phys_start: guard.heap_phys_start,
        usable_start,
        usable_total,
        free_bytes,
        largest_free_block,
        free_blocks,
        initialized: guard.initialized,
        source: guard.heap_source,
    }
}

pub fn install_heap_arena(arena: HeapArena) -> bool {
    if arena.length < minimum_block_size() {
        crate::log!(
            "heap: requested arena too small size={} bytes (need >= {})\n",
            arena.length,
            minimum_block_size()
        );
        return false;
    }

    let mut guard = ALLOCATOR.lock();
    if guard.initialized {
        crate::log!("heap: allocator already initialized; cannot swap backing\n");
        return false;
    }

    guard.install_heap(arena.virt_start, arena.phys_start as usize, arena.length);
    phys::register_heap(arena.virt_start, arena.phys_start as usize, arena.length);
    crate::log!(
        "heap: arena virt=0x{:X} phys=0x{:X} size={} MiB\n",
        arena.virt_start,
        arena.phys_start,
        arena.length / (1024 * 1024)
    );
    true
}

const fn minimum_block_size() -> usize {
    size_of::<FreeBlock>() + size_of::<AllocTag>()
}

fn align_up(addr: usize, align: usize) -> usize {
    let mask = align.saturating_sub(1);
    (addr + mask) & !mask
}

fn aligned_payload(block_start: usize, layout: Layout) -> Option<usize> {
    let payload_start = align_up(
        block_start + size_of::<FreeBlock>() + size_of::<AllocTag>(),
        layout.align(),
    );
    if payload_start > usize::MAX - layout.size() {
        None
    } else {
        Some(payload_start)
    }
}

fn alloc_error(layout: Layout) -> ! {
    let stats = heap_stats();
    crate::log!(
        "OOM: alloc request size={} align={}\n",
        layout.size(),
        layout.align()
    );
    crate::log!(
        "OOM: heap virt=0x{:X}..0x{:X} phys=0x{:X} src={:?} usable_start=0x{:X} usable_total={} free_bytes={} largest_free={} free_blocks={} init={}\n",
        stats.heap_start,
        stats.heap_end,
        stats.phys_start,
        stats.source,
        stats.usable_start,
        stats.usable_total,
        stats.free_bytes,
        stats.largest_free_block,
        stats.free_blocks,
        stats.initialized
    );

    unsafe {
        asm!("cli", options(nomem, nostack));
        loop {
            asm!("hlt", options(nomem, nostack));
        }
    }
}

Well I have just published the updated crate, this is the alloc module:

I have two thread-local bump allocators, Temp for “very temporary” heap allocations, Local for heap allocations that do not out-last the thread.

If there are outstanding allocations when a thread terminates, that generates a panic.

The Local bump allocator has to be explicitly enabled, as it isn’t appropriate for long-running threads.

Currently only Vecs and Boxes are bump-allocated, I could do with a HashMap, that is on the todo list for pstd.

In a few places I had to rework code in minor ways, for example you cannot collect to a Vec with an allocator, also there is no Default implementation for a Vec with an allocator.

I haven’t tried to measure any performance gain yet, I don’t expect anything major, mimalloc is actually amazingly fast. Overall it has been quite interesting and satisfying though!

Do note that a panic may not be enough; if the data leaked to another thread, or if the panic is somehow caught, the bad state could still be observed. You may need to abort instead.

1 Like

Good point. I also should do something about large allocations ( simple enough, don’t use bump for allocations over some size limit ).