UnsafeCell-based immutable access to wrapper for mutable reference -- is this sound?

I have a struct storing a number of RefCells that I'd like to pass around as regular & references for read-only access. However, I'd like to avoid a bunch of runtime borrow checks when reading from that & reference, so I'd like to essentially pre-borrow (at compile time) from a &mut reference. This is the pattern I'm trying to apply:

use std::cell::{RefCell, UnsafeCell};

pub struct SomeData {
    data1: RefCell<Vec<u32>>,
    data2: RefCell<Vec<u32>>,
    data3: RefCell<Vec<u32>>,
}

impl SomeData {
    pub fn get(&mut self, index: usize) -> (&u32, &u32,  &u32) {
        (
            &self.data1.get_mut()[index],
            &self.data2.get_mut()[index],
            &self.data3.get_mut()[index],
        )
    }

    pub fn as_read_only<'a>(&'a mut self) -> SomeDataReadOnlyRef<'a> {
        SomeDataReadOnlyRef { data: UnsafeCell::new(self) }
    }
}

pub struct SomeDataReadOnlyRef<'a> {
    data: UnsafeCell<&'a mut SomeData>,
}

impl<'a> SomeDataReadOnlyRef<'a> {
    pub fn get(&self, index: usize) -> (&u32, &u32, &u32) {
        // SAFETY: We hold an exclusive reference to the data.
        // So long as we don't mutate it ourselves, it is safe
        // to access it like this even in an immutable context.
        let data = self.data.get();
        let data = unsafe { data.as_mut().unwrap_unchecked() };
        data.get(index)
    }
}

Is this sound, as long as the read-only version doesn't actually mutate the contents of the RefCells? If not, is there a better way to do this and skip the runtime borrow checks here without making an entire clone of the data structure?

        // SAFETY: We hold an exclusive reference to the data.
        // So long as we don't mutate it ourselves, it is safe
        // to access it like this even in an immutable context.

At a minimum, this reasoning is wrong. It doesn't matter whether you actually do a mutation or not, the mere act of having &mut borrows which are live in a conflicting manner is always disallowed and immediate UB, entirely independent of how they are used. (Mostly. It's complicated to define "live.")

In this case, though, I wasn't able to quickly weaponize it in a way Miri would detect. This doesn't mean the API is sound, just that I didn't find an immediate issue. I suspect that std implementation details of how the UnsafeCells are layered and manipulated here is why my experimentation with Miri didn't cause any UB, combined with the fact that the Vecs introduce an indirection that disassociates the derefed-to references from the source mutable borrow.

TL;DR: it's probably sound as currently written, but I'm still uncomfortable with it; if it is sound, it's only so because of subtle, nonfinal details about the borrow model and specific nonguaranteed implementation details of std types.

The usual pitfall of similar !Sync mut-from-ref tricks is reentrancy, which this doesn't have to worry about. But the fact that the shared references are still "derived from" the mutable reference and can be live across another use of the mutable reference makes me uncomfortable; while I wasn't able to make Miri complain with a single thread, introducing threads could potentially make it possible to cause a UB scenario.

1 Like

Why do you (think you) need this? You should try to describe the actual situation instead of generic placeholders like "data", because in general, all of this seems very dubious.

If you have a RefCell, that means you can't perform compile-time borrow checking over shared references. Trying to circumvent it is unlikely to be possible correctly.

1 Like

Is there a reason you can't do something like this?

3 Likes

The usual pitfall of similar !Sync mut-from-ref tricks is reentrancy, which this doesn't have to worry about. But the fact that the shared references are still "derived from" the mutable reference and can be live across another use of the mutable reference makes me uncomfortable; while I wasn't able to make Miri complain with a single thread, introducing threads could potentially make it possible to cause a UB scenario.

This structure is Send but not Sync, if that helps. The usage goal is something along the lines of:

fn take_and_dispatch(data: &mut SomeData) {
    let read_only_data = data.as_read_only();
    task_1(data); 
    task_2(data);
    task_3(data);
}

where task_1, etc. take a &SomeDataReadOnlyRef.

Why do you (think you) need this? You should try to describe the actual situation instead of generic placeholders like "data", because in general, all of this seems very dubious.

If you have a RefCell, that means you can't perform compile-time borrow checking over shared references. Trying to circumvent it is unlikely to be possible correctly.

I don't think this is your intention, but this comes across as dismissive and condescending. Please trust that I've explored alternatives and that have come to this approach after evaluating their trade-offs. Additionally, I'm not sure what you're saying here -- get_mut() on a RefCell is a comple-time bypass of runtime borrow checking and is safe code as long as you have a mutable reference to that cell, which I do have here.

Is there a reason you can't do something like this?

Yes, I could do that. The issue is that in practice, the true data structure is actually a nested data structure of slotmaps that could contain potentially hundreds of data arrays in RefCells. For context, this is for the gecs crate, which generates an archetype ECS data structure at compile-time. Shallow-cloning every component array in the ECS world is unlikely to optimize well and the process of building it would be costly as the base structures grow. My goal is to create a more lightweight shim where I know I'm holding a mut reference to the data structure, but allow only read-only access to it, and bypass the RefCell runtime borrow check (because I know I'm holding it mutably anyway). There's also no trait-based solution to this because traits don't let you borrow individual fields, which is a requirement for this data structure.

For additional context, the reason I'm doing this is that slotmap lookups have a cost. They introduce one level of indirection between key and data. My goal here is to introduce an execution context where you can lookup a key once, retrieve its direct index in the dense list, and then continue operating with that index directly without repeatedly taking the cost hit for slot indirection and version checking. This is a fine thing to do so long as the slotmap's dense list doesn't change at all, which you could theoretically guarantee if the slotmap is read-only. In this case, however, because the slotmap dense data is stored in RefCells (to allow actual borrowing of data elsewhere), there's no good way to access them in an immutable context. I'd like to create a guarantee that I have mutable access to the data, but that it's also read-only, without having to create a large new data structure holding lots of references to the raw data.

The alternative here, if I want to avoid runtime checks, would be to simply use direct indices anyway and just do so unsafely by manually enforcing the "the ECS world never changes" constraint, but I'd like to provide a safer interface for doing so.

This is another alternative that produces mostly-similar assembly in practice, and has (I believe) easier-to-validate constraints.

use std::cell::RefCell;

pub struct SomeData {
    data1: RefCell<Vec<u32>>,
    data2: RefCell<Vec<u32>>,
    data3: RefCell<Vec<u32>>,
}

impl SomeData {
    pub fn get(&mut self, index: usize) -> (&u32, &u32,  &u32) {
        (
            &self.data1.get_mut()[index],
            &self.data2.get_mut()[index],
            &self.data3.get_mut()[index],
        )
    }

    pub unsafe fn get_unchecked(&self, index: usize) -> (&u32, &u32, &u32) {
        (
            &self.data1.try_borrow_unguarded().unwrap_unchecked()[index],
            &self.data2.try_borrow_unguarded().unwrap_unchecked()[index],
            &self.data3.try_borrow_unguarded().unwrap_unchecked()[index],
        )
    }

    pub fn as_read_only<'a>(&'a mut self) -> SomeDataReadOnlyRef<'a> {
        SomeDataReadOnlyRef { data: self }
    }
}

pub struct SomeDataReadOnlyRef<'a> {
    data: &'a mut SomeData,
}

impl<'a> SomeDataReadOnlyRef<'a> {
    pub fn get(&self, index: usize) -> (&u32, &u32, &u32) {
        unsafe { self.data.get_unchecked(index) }
    }
}

So long as nothing mutably borrows the RefCells, I believe this meets the safety guidelines of both try_borrow_unguarded and the unwrap_unchecked on it. The SomeDataReadOnlyRef needs to guarantee that it never mutably borrows any of the data RefCells, but so long as it exists, its lifetime holds exclusive access to the SomeData backing it, and can thus ensure that nothing else will mutably borrow any of those data RefCells. Does this seem correct?

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.