Create Safe Reference Cycle With UnsafeCell

I'm currently writing a simple compiler and having a problem representing Type in Rust because Type can have reference cycle. Consider the following example:

// a hypothetical type definition
type A = Tuple(int, B, int)
type B = Tuple(float, A, float)

^ above is not a valid definition since their size will be infinite. But for simplicity, let's assume above type definition is valid.

Currently, my approach is using some kind of cache/interner to store all the types and use i32 as a handle to the interner as the replacement of reference. So, above type definitions will be represented as:

enum Type {
   Int,
   Float,
   Tuple{fields: Vec<i32>},
}

type_interner[0] = Type::Int
type_interner[1] = Type::Float
type_interner[2] = Type::Tuple{fields: vec![0, 3 ,0]}
type_interner[3] = Type::Tuple{fields: vec![1, 2 ,1]}

My approach, kind of work, but it's a little bit annoying to debug. Because it's not easy to print a type representation since you will get output that looks like Type::Tuple{fields: [1 2 1]}, then you need to check what type does 1 and 2 represents. And it is also quite annoying to write because now I need to pass the type_interner everywhere, even for just printing the error message.

Now, I'm thinking about another approach. What if I create a reference cycle using raw pointer with &UnsafeCell<Ty>, and then cast it to &Type after all the types are resolved. Below code shows my new approach:

use std::cell::UnsafeCell;

enum Ty<'a> {
    Int,
    Float,
    Tuple(Tuple<'a>),
    Ptr(&'a Ty<'a>),
}

struct Tuple<'a> {
    fields: Vec<&'a Ty<'a>>,
}

fn main() {
    // imagine I have these types:
    // type A = (int, *B, int)
    // type B = (int, *A, int)
    
    // step 1: I pre-allocated all builtin types user defined types.
    //         Imagine all of these are allocated using arena.
    let ty_int = Box::new(UnsafeCell::new(Ty::Int));
    let ty_a = Box::new(UnsafeCell::new(Ty::Int));
    let ty_b = Box::new(UnsafeCell::new(Ty::Int));
    let ty_int = ty_int.as_ref();
    let ty_a = ty_a.as_ref();
    let ty_b = ty_b.as_ref();
    
    // step 2: I put them in a vector.
    //         Imagine I have a symbol table that maps user defined name to the
    //         index of the type in the vector somewhere.
    let all_types: Vec<&UnsafeCell<Ty>> = vec![ty_int, ty_a, ty_b];
    
    // step 3: I set all the types into the actual type.
    let ty_a = unsafe { &mut *all_types[1].get() };
    *ty_a = Ty::Tuple(Tuple{
        fields: vec![
            unsafe { &*(all_types[0].get()) }, 
            unsafe { &*(all_types[2].get()) },
            unsafe { &*(all_types[0].get()) },
        ],
    });
    
    let ty_b = unsafe { &mut *all_types[2].get() };
    *ty_b = Ty::Tuple(Tuple{
        fields: vec![
            unsafe { &*(all_types[0].get()) }, 
            unsafe { &*(all_types[1].get()) },
            unsafe { &*(all_types[0].get()) },
        ],
    });
    
    // step 4: I change all the types from &UnsafeCell<Ty> into &Ty
    let all_types: Vec<&Ty> = all_types.into_iter().map(|ty| unsafe { &*ty.get() } ).collect();
    
    // step 5. I can use it like this:
    let ty_a = all_types[1];
    let Ty::Tuple(ty_a_tuple) = ty_a else { panic!("ty_a is not a tuple"); };
    let ty_b = ty_a_tuple.fields[1];
    let Ty::Tuple(ty_b_tuple) = ty_b else { panic!("ty_b is not a tuple"); };
    assert!(matches!(ty_b_tuple.fields[0], &Ty::Int));
}

(Playground)

My question is: are the code above considered safe? Now I can have a two &Ty that borrow each other. There should be no memory leak if we use arena.

The first stop on this line of questioning is always Miri. Run it in the playground under Tools, top-right. It says your playground exhibits undefined behavior.

UnsafeCell doesn't "turn off" the exclusivity guarantees of &mut.

(I haven't looked at your code in depth at all.)

3 Likes

Thank you for your pointer. I didn't know we have Miri. I understand why my code is considered as UB. I think it's because there is a small moment where I have both mutable reference and shared reference point to same memory region which not typed as UnsafeCell.

I've been trying to find another way to make it work, but haven't find any other way that doesn't trigger UB from Miri. What I want is to have shared reference to A and B where A and B itself reference each other. Is this actually impossible in Rust?

Having an independent mutable reference and any other (mutable or immutable) references to anything is UB. Yes, it's UB even if the pointee is an UnsafeCell. It is explicitly documented in the description of UnsafeCell that it does not lift the uniqueness requirements on &mut T.

1 Like