Convenient deep mutability

Hi all, I'm building a compiler, and in various passes over the code, I need to (immutably) access a global namespace, while holding mutable references to something inside that namespace.

There's various ways I've found to get around this thus far, but I'm looking for a better solution.

A first little trick I used was to create an "IndexExcept iterator". I saw that within a single Module, Type or Constant, Instructions never refer to themselves. And so I could iterate mutably over the instructions quite while accessing the others immutably. Accessing the instruction currently in mutation panics.

pub struct IndexExcept<'buffer, ID: Copy + Eq, T: 'buffer, Buffer: Index<ID, Output = T>> {
    buf_ptr: *const Buffer,
    except: ID,
    _ph: PhantomData<&'buffer ()>,
}

impl<'buffer, ID: Copy + Eq, T: 'buffer, Buffer: Index<ID, Output = T>> Index<ID>
    for IndexExcept<'buffer, ID, T, Buffer>
{
    type Output = T;

    fn index(&self, index: ID) -> &T {
        assert!(index != self.except);
        unsafe { &(*self.buf_ptr)[index] }
    }
}

pub struct ConvenientMutableIterator<
    'buffer,
    ID: Copy + Eq,
    T: 'buffer,
    Buffer: Index<ID, Output = T>,
> where
    for<'b> &'b mut Buffer: IntoIterator<Item = (ID, &'b mut T)>,
{
    buf_ptr: *const Buffer,
    buf_iter: <&'buffer mut Buffer as IntoIterator>::IntoIter,
}

impl<'buffer, ID: Copy + Eq, T: 'buffer, Buffer: Index<ID, Output = T>> Iterator
    for ConvenientMutableIterator<'buffer, ID, T, Buffer>
where
    for<'b> &'b mut Buffer: IntoIterator<Item = (ID, &'b mut T)>,
{
    type Item = (&'buffer mut T, IndexExcept<'buffer, ID, T, Buffer>);

    fn next(&mut self) -> Option<Self::Item> {
        self.buf_iter.next().map(|(idx, v)| {
            (
                v,
                IndexExcept {
                    buf_ptr: self.buf_ptr,
                    except: idx,
                    _ph: PhantomData,
                },
            )
        })
    }
}

// Use:
for (instr, instructions) in instructions.iter_mut_convenient() {
    match instr {
        // and for instr's dependencies, we can simply use &instructions[dependency_id]
    }
}

Now, I've been very careful in this unsafe code, and I find it works well though if you find an issue with it I'd love to hear.

But my problem is one level up. When I'm initializing the type of a part of an instruction representing the port of another module, then I'd need to have mutable access to the instructions of other modules.

This is quite a conundrum. I would like to simply write and use the below:

    pub unsafe fn get_link_info_mut_unsafe(
        &mut self,
        global: GlobalUUID,
    ) -> (&mut LinkInfo, &Self) {
        let self_ptr = self as *const Linker;
        let global_link_info = match global {
            GlobalUUID::Module(md_id) => &mut self.modules[md_id].link_info,
            GlobalUUID::Type(typ_id) => &mut self.types[typ_id].link_info,
            GlobalUUID::Constant(cst_id) => &mut self.constants[cst_id].link_info,
        };
        unsafe { (global_link_info, &*self_ptr) }
    }

And given that any access I do to my types only reads from typ_expr fields, (and typ fields within the LinkInfo), and only writes to typ fields, it should be safe. But of course, I can't trust that the compiler won't blow up my code because it realizes &Self contains something that aliases with &mut LinkInfo

Thing is, I really don't want to reach for RefCell, there's lots of stuff after this in the pipeline, heavily using the typ and related fields, for whom the LinkInfo references will always be shared. And of course, I don't want to allow that code to mutate the LinkInfo in any way.

Basically, what my question comes down to is: Will the compiler blow up in my face if I do this, even though I'm never actually writing to a &mut that aliases with a & that I use? Or, should I just bite the bullet and accept RefCells everywhere?

It's not sound as you can e.g. collect all accessors and &mut _ items from an iterator and use them all at the same time. Here's a UAF. Even if you had a lending iterator, it wouldn't be sound in general as it depends on indexing operations on distinct indices to never alias one another, which is not a given.[1]

This is the only place you mention typ_expr so it's hard to connect it back to any code.

Anyway, I believe your question is: Can you reason about when the compiler will produce a useful binary when you definitely have UB (a &mut LinkInfo which is aliased by a shared reference)? And generally the answer is no, you cannot. The compiler performs many optimizations at many levels based in part on what is UB in the language, and those have emergent behavior. It does not perform a mechanical, easy-to-reason-about transformation from Rust to ASM.

(Edit: Fixed this link.)


  1. Imagine, say, some tree structure that has implemented indexing by walking down the tree; it may create some &Node that aliases the current &mut Node, which is UB. This mental exercise demonstrates that the Index implementation need not even be "unreasonable" to break the "distinct indices don't alias" assumption. â†Šī¸Ž

3 Likes

Okay, those are some very convincing arguments. The rabbitholes you linked to were rather interesting too.

Indeed, I see that the ConvenientMutableIterator implementation was offering too wide a contract. I had intuited that the output of the iterator would be mutable reborrow on the iterator, thereby preventing access to the iterator as long as they exist. But of course for core::iter::Iterator that's not the case. Indeed with Lending iterators as you mention, or even an iterating callback that could be enforced.

On the other hand, doesn't your Index counterexample with a tree also suffer from the first issue? Like while iterating over nodes and &mut Node refs being handed out, would it not also have to internally walk back over &mut nodes that it had handed out previously as it recurses back up the tree, thereby also violating the XOR principle? I guess It'd need to be completely implemented with *mut Node pointers internally such as to never create a reference except when giving them out. In that case, how does one prevent the creation of references all-together? Certainly since lots of core functions (Like core::cmp::PartialEq::eq) take references.

On the other hand, it seems like I'm spending a lot of effort trying to get around using RefCell. Likely the 50-odd instances of .borrow() are a small price to pay to be on the safe heh side

In the complete implementation in your head? Sure, quite possibly. I didn't create a complete model in my head, I just tried to come up with a reasonable scenario to provide some intuition that such a problem could come up non-maliciously, with "reasonable" consumers of your types.

That said, reasonableness is moot if you want a sound implementation anyway, because the safe-to-implement API of Index allows reading everything reachable from &self. It's like how a bad/silly/unreasonable Eq implementation may "break" HashMap in ways that cause panics, hangs, or leaking, it would still be HashMap's fault if it caused UB -- because it's safe to implement Eq in a bad/silly/unreasonable way.

On the third hand, here's a playground based on unary trees this linked list code. There's no unsafe and the Index implementation examines all nodes as it reaches the last element. And here's a playground with shared and exclusive iterators over a tree which also does not use unsafe. I was too lazy to write an Index operation for this one, but probably you can imagine a &_-based implementation.

The general idea is to stay in *mut _/*const _ land as much as possible and only have ephemeral references when you know you can satisfy their validity requirements. But this isn't always possible due to the API surface.

If you had some *mut Elem and *const Elem that derived from the same original *mut _, and it hasn't been invalidated, and you know they don't overlap -- you could create some short-lived references to call the PartialEq implementation.

But in the case of collections, it's common for the only available APIs for accessing members to be something like Index that requires granting access to the entire collection. In such cases it may not be possible to avoid your pointers being invalidated, or at least the possibility thereof.