Holding an impl AsMut<[u8]> and multiple &mut [u8] into it at the same time

mcy · July 24, 2020, 5:04pm

(Regulars here will know me from IRLO, but this is actually my first time on URLO!)

I'm building a generic arena-ey data structure, and I've run into a place where I don't know how to avoid breaking the aliasing rules by mistake.

I want to write something morally like the following struct (let's pretend for five seconds that we're writing C++ and there's no borrow checker):

struct Arena<Mem: AsMut<[u8]>> {
  mem: Mem,
  cursor: usize,
}

I will then provide the following inherent methods:

fn alloc(&self, len: usize) -> Result<&mut [u8]> {
  if len + cursor > self.slice.len() { return Err(...); }
  // This line below is sketchy.
  let slice = &mut self.mem.as_mut()[self.cursor..self.cursor+len];
  self.cursor += len;
  Ok(slice)
}
fn reset(&mut self) {
  self.cursor = 0;
}

Note that it is impossible for any of the references returned by alloc() to be held when we call reset(), because reset() requires &mut self, so from a caller's perspective, you can't create aliasing references.

Obviously, this implementation makes the borrow checker extremely unhappy, especially since, when Mem = &mut [u8], we have an obvious aliasing violation, so the borrow checker correctly rejects this program.

My question is: what is the most expedient way to correctly write this program, such that:

is no_std/no_alloc.
uses preferably only one unsafe block. (I believe core does not have the tools do do this fully in safe Rust.) My project is very allergic to unsafe.

I also want to generalize alloc and reset into a single trait, which complicates how I can split up the lifetimes across types. My first thought was to use MaybeUninit to hide bytes from the compiler, but I'm pretty sure that if I don't ask here I'm going to accidentally materialize two aliasing unique references and get extremely sad (I'm fairly certain that it's not possible to break returned references by moving the arena, even when Mem = [u8; N], since every slice in active use pins the arena in place).

mbrubeck · July 24, 2020, 5:40pm

Using AsMut<[u8]> here is questionable, because the only thing you can do to it is call AsMut::as_mut, which creates an &mut [u8] that references the entire memory block used by arena. As I understand the current unsafe code guidelines, merely creating this unique reference results in undefined behavior if there are any other live references to any part of that slice.

It would be better if you can use a trait or type that provides a method like Vec::as_mut_ptr that creates a raw pointer without an intermediate &mut [u8] reference, and then manipulate this raw pointer to create and return a slice reference.

daboross · July 24, 2020, 5:46pm

If you didn't need reset, I'd suggest using [T]::split_mut, as in maybe something like

struct Arena<'a> {
  mem: Cell<Option<&'a mut [u8]>>,
}
impl<'a> Arena<'a> {
    fn alloc(&self, len: usize) -> Result<&'a mut [u8]> {
        let mem = self.mem.take().unwrap();
        if len > mem.len() {
            self.mem.set(Some(mem));
            return Err(...);
        }
        let (allocated, remaining) = mem.split_at_mut(len);
        self.mem.set(Some(remaining));
        Ok(allocated)
    }
}

Edit: if it works for your application, maybe you could do a two-level wrapper - and the reset operation could be removing and reconstructing the inner wrapper? Something like:

struct ReusableArena<Mem: AsMut<[u8]>> {
    mem: Mem
}

impl<Mem: AsMut<[u8]>> ReusableArena<Mem> {
    fn new(mem: Mem) -> Self {
        ReusableArena { mem }
    }
    fn create_arena(&mut self) -> Arena<'_> {
        Arena::new(self.mem.as_mut())
    }
}

(playground)

Resetting would be dropping Arena and replacing it with a new call to create_arena.

If you really do need your exact interface, then I agree with the suggestion of using a raw pointer. If you only store or only retrieving the "whole" block as a raw *mut [u8] or *mut u8 pointer, and only ever create &mut slices to bits of memory which aren't already represented in &mut slices, then your interface should be sound.

2e71828 · July 24, 2020, 5:48pm

My first attempt at this is similar to @daboross's, and likely is unsound due to aliasing &mut's: (Playground)

use core::cell::Cell;

struct Arena<'a> {
  mem: *mut [u8],  // Raw pointer so compiler doesn't look too closely
                   // This may be cargo-culting and unnecessary
  cursor: Cell<Option<&'a mut [u8]>>,
}

impl<'arena> Arena<'arena> {
    fn new(mem:&'arena mut [u8])->Self {
        Arena{mem: mem as *mut [u8], cursor: Cell::new(None)}
    }
    
    fn alloc<'alloc>(&'alloc self, len: usize)
    -> Result<&'alloc mut [u8], &'static str> {
        let mut free: &'arena mut [u8] = match self.cursor.take() {
            Some(cursor) => cursor,
            None => unsafe { &mut (* self.mem) }
        };
        
        if free.len() < len {
            self.cursor.set(Some(free));
            Err("Not enough space")
        } else {
            let (result, free) = free.split_at_mut(len);
            self.cursor.set(Some(free));
            Ok(result)
        } 
    }
    
    fn reset(&mut self) {
        self.cursor.set(None);
    }
}

daboross · July 24, 2020, 5:52pm

As far as I can tell, this code looks sound! It doesn't ever create two &mut slices pointing to the same bit of memory, and aliasing *mut with &mut is fully allowed.

I'd probably change alloc to return &'arena mut [u8], as it can, but otherwise this looks like a good solution.

mbrubeck · July 24, 2020, 5:54pm

I believe this would be unsound, as it would then be possible to call reset while an allocated slice is still alive.

mcy · July 24, 2020, 5:56pm

This is correct, and thus my primary concern. I realllly want AsMut because then I can just own a Box<[u8]> or a &mut [u8], depending on what the caller wants.

I wish there was a way to call AsMut while asking the compiler not to materialize references because I'm going to immediately erase the reference into a pointer.

@2e71828 That is correct, but requires you to hand over an &mut [u8]. It's the "easy" version but I kind of wanted to avoid it for ergonomics. Edit: it's actually incorrect, you materialize a lifetime to the whole slice in alloc.

@daboross That does not work, because now the unique slices can outlive the arena altogether, and does not pin the &self in the alloc() call, which is what makes the non-AsMut solution well-formed.

HeroicKatora · July 24, 2020, 6:10pm

As an allocator, I assume you manually ensure that each part is only given out once. The arguably correct abstraction for this would be to wrap the memory you received as a &'_ UnsafeCell<[u8]>, or deref to this, which does very precisely what you need: You're allowed to hold a reference to the cell itself while having given out mutable references to the interior as long as you unsafely assure that these mutable references to the interior do not overlap.

mcy · July 24, 2020, 6:13pm

Honestly, even that feels wrong. I can't actually slice into that, so I still need to offset a *mut u8 and then use from_raw_parts. It seems like it would make more sense to hold a PhantomData<&'a ()> and a *mut u8 that I do pointer arithmetic on. Which is... gross. But it would be correct.

mbrubeck · July 24, 2020, 6:15pm

In this case I would probably create a new unsafe trait AsMutPtr and implement it for all the standard pointer-to-slice types, being careful that no implementation creates a temporary reference to the entire slice.

This approach has some downsides: It adds more boilerplate and unsafe code, and also won't support third-party types like ArrayVec unless you explicitly include them.

mcy · July 24, 2020, 6:16pm

Thus the sadness. =) I think I can get away with just doing Arena::new(&mut buf) and letting slice coercion do its magic.

HeroicKatora · July 24, 2020, 6:19pm

Both of these are asserted to be unique. You can't just use another pointer and dereference that without being dependent on them at them same time. This is an I-unsound bug in Cell with miscompilation. You need to store both as a raw pointer anyways for this to be sound.

mcy · July 24, 2020, 6:45pm

Right. My hope was to hide the owning value in something like MaybeUninit, but it seems like there's no way to do this with the current optimization barriers rustc offers. Every solution that does everything I want either violates aliasing or is trivially broken by moving the arena; the closest I think I can mange is to explode the &mut [u8] into a (PhantomData<&'a mut ()>, *mut u8, usize) and do the mindly unpleasant pointer arithmetic myself. =(

2e71828 · July 24, 2020, 7:02pm

Attempt #2; it takes a Pin<DerefMut<Target=[u8]>> and transmutes the pointer to a byte array and back to avoid having any aliased &muts. I'm not terribly confident in its correctness, but it compiles: (Playground)

use core::cell::Cell;
use core::pin::Pin;
use core::ops::DerefMut;
use core::marker::PhantomData;
use core::mem;

const MAX_PTR_SIZE:usize=64;

struct Arena<'a, T:DerefMut<Target=[u8]> + 'a> {
  mem: *mut [u8],  // Raw pointer so compiler doesn't look too closely
                   // This may be cargo-culting and unnecessary
  cursor: Cell<Option<&'a mut [u8]>>,
  ptr: [u8; MAX_PTR_SIZE],
  phantom: PhantomData<Pin<T>>,
}

impl<'arena, T:DerefMut<Target=[u8]>+'arena> Arena<'arena, T> {
    pub fn new(mut ptr:Pin<T>)->Self {
        assert!(mem::size_of::<Pin<T>>() < MAX_PTR_SIZE);
        let mut bytes:[u8;MAX_PTR_SIZE] = [0;MAX_PTR_SIZE];
        let mem = unsafe { Pin::get_unchecked_mut(ptr.as_mut()) } as *mut [u8];
        unsafe { bytes.as_mut_ptr()
                      .copy_from(&mut ptr as *mut Pin<T> as *mut u8,
                                 mem::size_of::<Pin<T>>());
                 mem::forget(ptr);
        }
        Arena{mem, cursor: Cell::new(None), ptr: bytes, phantom: PhantomData }
    }
    
    pub fn alloc<'alloc>(&'alloc self, len: usize)
    -> Result<&'alloc mut [u8], &'static str> {
        let free: &'arena mut [u8] = match self.cursor.take() {
            Some(cursor) => cursor,
            None => unsafe { &mut (* self.mem) }
        };
        
        if free.len() < len {
            self.cursor.set(Some(free));
            Err("Not enough space")
        } else {
            let (result, free) = free.split_at_mut(len);
            self.cursor.set(Some(free));
            Ok(result)
        } 
    }
    
    pub fn reset(&mut self) {
        self.cursor.set(None);
    }
}

impl<'arena, T:DerefMut<Target=[u8]>+'arena> Drop for Arena<'arena, T> {
    fn drop(&mut self) {
        self.cursor.set(None);
        let _: Pin<T> = unsafe {
            (self.ptr.as_mut_ptr() as *mut Pin<T>).read_unaligned()
        };
    }
}

mcy · July 24, 2020, 7:09pm

You don't actually need Pin here. I also suspect there's an issue with the fact that you're holding onto a reference (even if inside a Cell) that becomes invalid if the struct is moved.

2e71828 · July 24, 2020, 7:10pm

Pin says that the pointer's target is immovable, but not the pointer itself. Requiring Pin<T> means that T has to store the buffer somewhere outside the structure, which should keep the references to the buffer valid as the Arena structure gets moved.

steffahn · July 24, 2020, 7:36pm

This might be madness and/or unsound, but it has your proposed API:

use std::cell::{Cell, UnsafeCell};
use std::ptr;

pub struct Arena<Mem: AsMut<[u8]>> {
    mem: UnsafeCell<Mem>,
    free_space: Cell<*mut [u8]>,
    self_ptr: Cell<*const Arena<Mem>>,
}

pub type Result<T> = std::result::Result<T, ()>;

impl<Mem: AsMut<[u8]>> Arena<Mem> {
    pub fn new(mem: Mem) -> Self {
        Arena {
            mem: UnsafeCell::new(mem),
            free_space: Cell::new(ptr::slice_from_raw_parts_mut(ptr::null_mut(), 0)),
            self_ptr: Cell::new(ptr::null()),
        }
    }
    pub fn alloc(&self, len: usize) -> Result<&mut [u8]> {
        if self.self_ptr.get() != self {
            self.self_ptr.set(self);
            self.free_space.set(unsafe { (*self.mem.get()).as_mut() });
        }
        let free_space = unsafe { &mut *self.free_space.get() };
        if len > free_space.len() {
            Err(())
        } else {
            let (allocated, free_space) = free_space.split_at_mut(len);
            self.free_space.set(free_space);
            Ok(allocated)
        }
    }
    pub fn reset(&mut self) {
        self.self_ptr.set(ptr::null());
    }
}

mcy · July 24, 2020, 8:04pm

self.self_ptr.get() != self

Definitely madness, but perhaps sound! I don't see anything immediately wrong (I want to believe this is a perfectly reasonable way to detect that you've been moved...?). Let's see if someone who's a better language lawyer than I gives this a ++.

steffahn · July 24, 2020, 8:39pm

My concern is that if the arena gets moved twice, the second time back where it came from, but actually not really back but just accidentally into the same memory address, but a “new allocation”; then it might (possibly) be unsound. I’m not sure, but I have doubts, at least judging from what I’ve read on block posts from @RalfJung. I haven’t managed to get Miri to complain over the code, yet. Maybe that’s a limitation of Miri though. Or me having zero experience with the tool. This playground example does replace the arena back into the same place in both debug and release, but it gets different addresses each time in Miri.

Also, I have another example that results in different behavior between debug and release. This alone is definitely something you might not want to have.

HeroicKatora · July 24, 2020, 9:54pm

The only additional safety relevant semantics that Pin<Ptr> adds relate to <Ptr::Target as Drop>, in that this Drop impl must be called before the memory is deallocated. Since Target = [u8], which does not have any Drop impl, there is nothing you gain from pinning.

Topic		Replies	Views
Correctly aliasing a *mut T help	11	1323	January 9, 2020
Help me understand this UB with stacked borrows detected by `miri` help	13	841	July 19, 2022
CPP to Rust need help help	3	583	January 12, 2023
I was forced to use pointer to workaround the borrow check. Would this cause a memory safety problem? help	3	371	March 24, 2023
Why they behave differently?	4	357	February 13, 2022

Holding an impl AsMut<[u8]> and multiple &mut [u8] into it at the same time

Related Topics