Hello again. I am at the last stage of writing a compiler for my compilers class. For this project we are taking a flat intermediate representation and translating it into x86-64 assembly. Here is my code so far (up to IR generation): GitHub - brasswood/rust-cmmc: A compiler for the C-- programming language, written in Rust, for EECS 665 Compiler Construction at KU
I'm going to preface this by saying the course was designed around C++ and I chose to implement my projects in Rust to become more familiar with the language. The design of my compiler is mostly the design that was recommended by the course, which assumed we would be writing our projects in C++. So if a lot of this looks like wannabe C++ code, that is probably why.
Now, to the problem. In my intermediate representation, there are Quads and Operands. Quads are instructions, and Operands are the things the instructions operate on. One type of operand is a SymbolOperand:
pub struct SymbolOperandStruct<'a> {
pub symbol: Rc<Symbol<'a>>,
}
A symbol operand just has a reference to a named symbol in the program that was found during name analysis:
pub struct Symbol<'a> {
pub name: &'a str,
pub typ: SymbolType,
}
When generating x86-64, I need to come up with an offset from the base pointer for each local variable in a function. So my first idea was to give SymbolOperandStruct an offset variable:
pub struct SymbolOperandStruct<'a> {
pub symbol: Rc<Symbol<'a>>,
pub offset: usize,
}
The problem with this, however, is that in my code, a new SymbolOperandStruct is generated every time a variable is encountered in the AST, but I need all of the SymbolOperandStructs that have the same Symbol to also have the same offset. Additionally, I have two other types of operands that reference a Symbol:
// created when "&var" is encountered in the AST
pub struct AddrOperandStruct<'a> {
pub symbol: Rc<Symbol<'a>>,
}
// created when "@var_ptr" ("*var_ptr" in C++) is encountered in the AST
pub struct DerefOperandStruct<'a> {
pub symbol: Rc<Symbol<'a>>,
}
In these types of operands, the offset needs to be the same as a SymbolOperandStruct if it points to the same Symbol. So I'm really thinking that the offset should be associated with the Symbol.
pub struct Symbol<'a> {
pub name: &'a str,
pub typ: SymbolType,
pub offset: usize,
}
Now, here's the problem. I don't compute the offset of a Symbol until long after the creation of it. But by the time I am ready to compute the offset, it is already being pointed to by multiple operands. Which makes me think I now need to use interior mutability for the Symbol:
pub struct SymbolOperandStruct<'a> {
pub symbol: Rc<RefCell<Symbol<'a>>>,
}
pub struct AddrOperandStruct<'a> {
pub symbol: Rc<RefCell<Symbol<'a>>>,
}
pub struct DerefOperandStruct<'a> {
pub symbol: Rc<RefCell<Symbol<'a>>>,
}
I know this pattern is discouraged except where absolutely necessary. Is there any way I can avoid using it?