I'm currently writing a simple compiler and having a problem representing Type in Rust because Type can have reference cycle. Consider the following example:
// a hypothetical type definition
type A = Tuple(int, B, int)
type B = Tuple(float, A, float)
^ above is not a valid definition since their size will be infinite. But for simplicity, let's assume above type definition is valid.
Currently, my approach is using some kind of cache/interner to store all the types and use i32
as a handle to the interner as the replacement of reference. So, above type definitions will be represented as:
enum Type {
Int,
Float,
Tuple{fields: Vec<i32>},
}
type_interner[0] = Type::Int
type_interner[1] = Type::Float
type_interner[2] = Type::Tuple{fields: vec![0, 3 ,0]}
type_interner[3] = Type::Tuple{fields: vec![1, 2 ,1]}
My approach, kind of work, but it's a little bit annoying to debug. Because it's not easy to print a type representation since you will get output that looks like Type::Tuple{fields: [1 2 1]}
, then you need to check what type does 1
and 2
represents. And it is also quite annoying to write because now I need to pass the type_interner everywhere, even for just printing the error message.
Now, I'm thinking about another approach. What if I create a reference cycle using raw pointer with &UnsafeCell<Ty>
, and then cast it to &Type
after all the types are resolved. Below code shows my new approach:
use std::cell::UnsafeCell;
enum Ty<'a> {
Int,
Float,
Tuple(Tuple<'a>),
Ptr(&'a Ty<'a>),
}
struct Tuple<'a> {
fields: Vec<&'a Ty<'a>>,
}
fn main() {
// imagine I have these types:
// type A = (int, *B, int)
// type B = (int, *A, int)
// step 1: I pre-allocated all builtin types user defined types.
// Imagine all of these are allocated using arena.
let ty_int = Box::new(UnsafeCell::new(Ty::Int));
let ty_a = Box::new(UnsafeCell::new(Ty::Int));
let ty_b = Box::new(UnsafeCell::new(Ty::Int));
let ty_int = ty_int.as_ref();
let ty_a = ty_a.as_ref();
let ty_b = ty_b.as_ref();
// step 2: I put them in a vector.
// Imagine I have a symbol table that maps user defined name to the
// index of the type in the vector somewhere.
let all_types: Vec<&UnsafeCell<Ty>> = vec![ty_int, ty_a, ty_b];
// step 3: I set all the types into the actual type.
let ty_a = unsafe { &mut *all_types[1].get() };
*ty_a = Ty::Tuple(Tuple{
fields: vec![
unsafe { &*(all_types[0].get()) },
unsafe { &*(all_types[2].get()) },
unsafe { &*(all_types[0].get()) },
],
});
let ty_b = unsafe { &mut *all_types[2].get() };
*ty_b = Ty::Tuple(Tuple{
fields: vec![
unsafe { &*(all_types[0].get()) },
unsafe { &*(all_types[1].get()) },
unsafe { &*(all_types[0].get()) },
],
});
// step 4: I change all the types from &UnsafeCell<Ty> into &Ty
let all_types: Vec<&Ty> = all_types.into_iter().map(|ty| unsafe { &*ty.get() } ).collect();
// step 5. I can use it like this:
let ty_a = all_types[1];
let Ty::Tuple(ty_a_tuple) = ty_a else { panic!("ty_a is not a tuple"); };
let ty_b = ty_a_tuple.fields[1];
let Ty::Tuple(ty_b_tuple) = ty_b else { panic!("ty_b is not a tuple"); };
assert!(matches!(ty_b_tuple.fields[0], &Ty::Int));
}
My question is: are the code above considered safe? Now I can have a two &Ty
that borrow each other. There should be no memory leak if we use arena.