How to Fix Miri Reporting Dangling Box

Unfortunately I couldn't easily fit my issue into a Rust playground, so I'll try to explain/show what I'm doing.

I'm trying to build out resource storage for an Entity-Component-System. I need untyped backing storage, so that it can be accessed by arbitrary languages over FFI, but I don't want to sacrifice a nice, typed Rust API.

tl;dr: I'm trying to implement something like a TypeMap, but on top of a HashMap<TypeId, AlignedVec<u8>>. I'm running into a Miri error in my tests, but only when I put a pointer such as a Box<T> or a String inside the map.

I'm getting this Miri error:

constructing invalid value: encountered a dangling box (address 0x208a70 is unallocated)

I happens when I try to cast the raw bytes of the Box or String, back into a Box or String, and then read the data pointed to by Box or String. For non-pointer types, ( integers at least ) it seems to work fine.

When running the test without Miri, it works fine, but with Miri I get an error. Miri seems to think that the memory pointed to by the box gets de-allocated, but I tried to avoid that by using ManuallyDrop to make sure the destructor isn't run, and I can verify that the destructor doesn't run by println-ing in a custom drop implementation.

I'm not super unsafe Rust savvy yet, so I could be missing something obvious.

Details & Code

So I have my UntypedResources like this:

#[derive(Clone, Default)]
pub struct UntypedResources {
    resources: UuidMap<AtomicRefCell<AVec<u8>>>,
}

impl UntypedResources {
    pub fn new() -> Self {
        Self::default()
    }

    /// Insert a new resource
    pub fn insert(&mut self, uuid: Uuid, layout: Layout, data: &[u8]) {
        assert_eq!(
            layout.size(),
            data.len(),
            "Layout does not match data length"
        );
        let mut storage = AVec::<u8>::with_capacity(layout.align(), layout.size());
        for _ in 0..layout.size() {
            storage.push(0);
        }
        storage.copy_from_slice(data);

        self.resources.insert(uuid, AtomicRefCell::new(storage));
    }

    /// Get a cell containing the resource data for the given ID
    pub fn get(&self, uuid: Uuid) -> Option<AtomicRefCell<AVec<u8>>> {
        self.resources.get(&uuid).cloned()
    }

    /// Remove a resource
    pub fn remove(&mut self, uuid: Uuid) -> Option<AtomicRefCell<AVec<u8>>> {
        self.resources.remove(&uuid)
    }
}

The AVec is an aligned Vec from the aligned-vec crate. The AtomicRefCell is essentially equivalent to an Arc<Mutex<T>> from the atomic_ref_cell_try crate.

On top of the UntypedResources I've got the Resources that wraps around the untyped storage and adds a statically typed API:

#[derive(Clone, Default)]
pub struct Resources {
    untyped: UntypedResources,
    type_ids: UuidMap<TypeId>,
}

impl Resources {
    pub fn new() -> Self {
        Self::default()
    }

    pub fn insert<T: TypedEcsData + Debug>(&mut self, resource: T) {
        self.try_insert(resource).unwrap();
    }

    pub fn try_insert<T: TypedEcsData + Debug>(&mut self, resource: T) -> Result<(), EcsError> {
        let mut resource = ManuallyDrop::new(resource);
        let uuid = T::uuid();
        let type_id = TypeId::of::<T>();

        let layout = Layout::new::<T>();
        let ptr = resource.deref_mut() as *mut T as *mut u8;
        let bytes = unsafe { slice::from_raw_parts_mut(ptr, size_of::<T>()) };
        dbg!(&bytes);

        match self.type_ids.entry(uuid) {
            std::collections::hash_map::Entry::Occupied(entry) => {
                if entry.get() != &type_id {
                    return Err(EcsError::TypeUuidCollision);
                }

                self.untyped.insert(uuid, layout, bytes);
            }
            std::collections::hash_map::Entry::Vacant(entry) => {
                entry.insert(type_id);
                self.untyped.insert(uuid, layout, bytes);
            }
        }

        Ok(())
    }

    pub fn get<T: TypedEcsData>(&self) -> AtomicResource<T> {
        self.try_get().unwrap()
    }

    pub fn try_get<T: TypedEcsData>(&self) -> Option<AtomicResource<T>> {
        let untyped = self.untyped.get(T::uuid())?;

        Some(AtomicResource {
            untyped,
            _phantom: PhantomData,
        })
    }
}

pub struct AtomicResource<T: TypedEcsData> {
    untyped: AtomicRefCell<AVec<u8>>,
    _phantom: PhantomData<T>,
}

impl<T: TypedEcsData + Debug> AtomicResource<T> {
    pub fn borrow(&self) -> AtomicRef<T> {
        let borrow = self.untyped.borrow();
        AtomicRef::map(borrow, |data| {
            dbg!(&data);
            let data = unsafe { &*(data.as_ptr() as *const T) };
            dbg!(&data);

            data
        })
    }

    pub fn borrow_mut(&self) -> AtomicRefMut<T> {
        let borrow = self.untyped.borrow_mut();
        AtomicRefMut::map(borrow, |data| unsafe {
            dbg!(&data);
            let data = &mut *(data.as_mut_ptr() as *mut T);
            // This fails in Miri, when trying to dereference the data
            dbg!(&data);

            data
        })
    }
}

I just realized that AtomicRefcell<T> isn't like an Arc<Mutex<T>>, it's like an RwLock<T> ( or RefCell ), so I still need to wrap it in an Arc and I was cloning it without realizing that I was creating a copy of the data. I'm not sure if that makes a difference. Duh. :man_facepalming:

I'm not sure if that will fix my issue, but it does change things.

It's hard to know what all is going on without knowing what TypedEscData provides. But at a minimum you can't create a &[u8] from a pointer to an instance of an arbitrary type: padding is uninitialized.

Oh, yeah, I forgot about that one. It's super trait of Sync + Send + Clone + TypeUuid + 'static.

I'm not understanding how the link is related to representing a pointer as a &[u8].


At this point, though, I am wondering if there are any crates out there that will handle this for me.


Edit: Nevermind, I think see what you mean about the padding bytes. The pointer may contain padding, but you aren't allowed to access padding bytes, so if you put it in a &[u8] you could get access to padding bytes which is undefined behavior.

I think that means I need to be storing pointers, not slices, to the data in my untyped storage. That gives me something to try I think.

I was under the impression that reading an uninitialized integer wasn't UB, is that not the case?


I think you could just flip your casts around. Rather than casting your typed pointer to a byte pointer, cast your byte pointer to a pointer of the correct type.

You'll have to be careful about how you write to the pointer to make sure you have the correct semantics around dropping old values though.

The link was just the most handy official documentation I found that padding bytes are uninitialized.

They're creating a reference to uninitialized bytes; that's UB. You can unsafely read uninitialized bytes through a pointer (*const _ et cetera), though you'll get an arbitrary byte back, as far as I know.

Related but tangential conversations here.

2 Likes

Interesting. That's good to know, thanks for the links.

I'm not sure exactly what you mean here.

I'm taking a &mut T and trying to cast that to &[u8] because I need to store it untyped, but I'm realizing I don't really need a &[u8], I actually just need a *mut u8.

I tend to want to shove everything into slices because they feel "safer" to me, but that's pretty irrelevant at this point when I'm doing all this casting anyway.

So I tried out a modification and that actually passes Miri! Not sure if it's sound, but that's a step:

I modified the untyped insert to take a raw pointer instead of a slice:

    pub unsafe fn insert(
        &mut self,
        uuid: Uuid,
        layout: Layout,
        data: *mut u8,
        // I'm still experimenting with how to drop...
        drop_fn: Option<unsafe fn(*mut u8)>,
    ) {
        unsafe {
            let ptr = std::alloc::alloc(layout);
            std::ptr::copy_nonoverlapping(data, ptr, layout.size());

            self.resources.insert(
                uuid,
                ResourceInfo {
                    cell: Arc::new(AtomicRefCell::new(ptr)),
                    drop_fn,
                },
            );
        }
    }

Then when I need to cast back to the data type I just cast the pointer:

ptr.cast::<T>().as_ref().unwrap()

Interesting. Do you think that means that I can't std::ptr::copy_nonoverlapping in the way I do above?

Eh, guess they haven't committed to that even.

use std::mem::{self, MaybeUninit};

let x: i32 = unsafe { mem::uninitialized() }; // undefined behavior! ⚠️
// The equivalent code with `MaybeUninit<i32>`:
let x: i32 = unsafe { MaybeUninit::uninit().assume_init() }; // undefined behavior! ⚠️

Here's what I meant

use std::{alloc::Layout, iter::repeat};
use aligned_vec::AVec;

fn into_bytes<T>(value: T) -> AVec<u8> {
    let layout = Layout::for_value(&value);
    let mut bytes = AVec::from_iter(layout.align(), repeat(0).take(layout.size()));
    let ptr = bytes.as_mut_ptr() as *mut T;
    unsafe { ptr.write(value) };

    bytes
}

unsafe fn from_bytes<T>(bytes: AVec<u8>) -> T {
    unsafe { (bytes.as_ptr() as *mut T).read() }
}

fn round_trip<T: std::fmt::Debug>(value: T) {
    println!("Before: {:?}", value);
    let bytes = into_bytes(value);
    println!("Bytes: {bytes:?}");
    let after: T = unsafe { from_bytes(bytes) };
    println!("After: {after:?}");
}

#[cfg(test)]
mod test {
    use crate::round_trip;

    #[test]
    fn string() {
        round_trip("hello".to_string());
        round_trip("raw");
    }

    #[test]
    fn boxed() {
        round_trip(Box::new("boxed"));
        round_trip(Box::new(5));
    }
}

Basically: allocate the storage the same way you were before, then cast that storage pointer to the type you're storing. The direction you're going is probably better for a variety of reasons, though.

1 Like

Ah, I see. That's a good idea and saves me the copy_nonoverlapping. I restructured a little to encapsulate the untyped resources in a wrapper that makes this much safer:

#[derive(Clone)]
pub struct UntypedResource {
    pub cell: Arc<AtomicRefCell<*mut u8>>,
    pub drop_fn: Option<unsafe fn(*mut u8)>,
    pub layout: Layout,
}

impl UntypedResource {
    pub fn new<T>(resource: T) -> Self {
        let resource = ManuallyDrop::new(resource);
        let layout = Layout::new::<T>();

        unsafe {
            let ptr = std::alloc::alloc(layout);
            let ptr_t = ptr as *mut ManuallyDrop<T>;
            ptr_t.write(resource);
            let ptr = ptr_t as *mut u8;

            Self {
                cell: Arc::new(AtomicRefCell::new(ptr)),
                drop_fn: Some(T::raw_drop),
                layout,
            }
        }
    }
}

impl Drop for UntypedResource {
    fn drop(&mut self) {
        if let Some(drop_fn) = self.drop_fn {
            let null_cell = Arc::new(AtomicRefCell::new(std::ptr::null_mut()));
            let cell = mem::replace(&mut self.cell, null_cell);

            let ptr = Arc::try_unwrap(cell)
                .expect(
                    "You must drop all references to Resoruces before dropping `UntypedResources`",
                )
                .into_inner();

            unsafe {
                drop_fn(ptr);

                alloc::dealloc(ptr, self.layout);
            }
        }
    }
}

Now I've almost got dropping working, I've just got to figure out why a Vec inside of a newtype isn't getting dropped when I run ptr::drop_in_place().

Ah, solved the memory leak. I was drop_in_place()-ing a *mut u8 when I needed to cast to a *mut T first.

Now it seems to be working great!

Thank you both for your help!

No, that one is explicitly untyped, memcpy-like, and allows copying uninitialized bytes. But also preserves "initialization state".

My take away is that they're keeping the door open to say a typed read of uninitialized bytes is always UB (with the presumed goal of never being able to read deallocated memory, or memory some other program didn't clean up that you got from an allocation, etc., without being UB).

1 Like