Advice on interior mutability for x86-64 memory offset generation

Hello again. I am at the last stage of writing a compiler for my compilers class. For this project we are taking a flat intermediate representation and translating it into x86-64 assembly. Here is my code so far (up to IR generation): GitHub - brasswood/rust-cmmc: A compiler for the C-- programming language, written in Rust, for EECS 665 Compiler Construction at KU

I'm going to preface this by saying the course was designed around C++ and I chose to implement my projects in Rust to become more familiar with the language. The design of my compiler is mostly the design that was recommended by the course, which assumed we would be writing our projects in C++. So if a lot of this looks like wannabe C++ code, that is probably why.

Now, to the problem. In my intermediate representation, there are Quads and Operands. Quads are instructions, and Operands are the things the instructions operate on. One type of operand is a SymbolOperand:

pub struct SymbolOperandStruct<'a> {
    pub symbol: Rc<Symbol<'a>>,
}

A symbol operand just has a reference to a named symbol in the program that was found during name analysis:

pub struct Symbol<'a> {
    pub name: &'a str,
    pub typ: SymbolType,
}

When generating x86-64, I need to come up with an offset from the base pointer for each local variable in a function. So my first idea was to give SymbolOperandStruct an offset variable:

pub struct SymbolOperandStruct<'a> {
    pub symbol: Rc<Symbol<'a>>,
    pub offset: usize,
}

The problem with this, however, is that in my code, a new SymbolOperandStruct is generated every time a variable is encountered in the AST, but I need all of the SymbolOperandStructs that have the same Symbol to also have the same offset. Additionally, I have two other types of operands that reference a Symbol:

// created when "&var" is encountered in the AST
pub struct AddrOperandStruct<'a> {
    pub symbol: Rc<Symbol<'a>>,
}

// created when "@var_ptr" ("*var_ptr" in C++) is encountered in the AST
pub struct DerefOperandStruct<'a> {
    pub symbol: Rc<Symbol<'a>>,
}

In these types of operands, the offset needs to be the same as a SymbolOperandStruct if it points to the same Symbol. So I'm really thinking that the offset should be associated with the Symbol.

pub struct Symbol<'a> {
    pub name: &'a str,
    pub typ: SymbolType,
    pub offset: usize,
}

Now, here's the problem. I don't compute the offset of a Symbol until long after the creation of it. But by the time I am ready to compute the offset, it is already being pointed to by multiple operands. Which makes me think I now need to use interior mutability for the Symbol:

pub struct SymbolOperandStruct<'a> {
    pub symbol: Rc<RefCell<Symbol<'a>>>,
}

pub struct AddrOperandStruct<'a> {
    pub symbol: Rc<RefCell<Symbol<'a>>>,
}

pub struct DerefOperandStruct<'a> {
    pub symbol: Rc<RefCell<Symbol<'a>>>,
}

I know this pattern is discouraged except where absolutely necessary. Is there any way I can avoid using it?

How about using a hashmap to store the offsets for each symbol?

If you do wish to use interior-mutability, then you can just wrap the offset in a Cell rather than wrap the entire thing in RefCell.

3 Likes

I've thought about the HashMap before, as well as for some other parts of the compiler (like for keeping the SymbolOperandStructs themselves). I've been hesitant to use them because I'm concerned about performance, though. What is the overhead in theory/practice?

Hash maps are very fast both theoretically and in practice. Additionally, they are often used in compilers — I've heard someone remark that the Rust compiler is just a bunch of hash maps. (This was in a conversation about a hash map performance improvement making the compiler faster.)

Is HashMap in Rust thread safe? I personally understand that because of the concept of ownership, HashMap is thread safe in Rust, unlike Java

Reading from it in parallel is allowed, writing to it in parallel is not. The only difference to Java is that the compiler will prevent you from making writes in parallel.

If in Arc<Mutex<hash_Map>> can parallel writing ensure multithreading safety?

Well, the mutex will ensure that the writes do not happen in parallel, but one after the other. But yes, it would make it safe to modify from several threads.

For what it's worth, if this is related to your compiler, don't attempt to parallelize it in this way. Parallelization is not easy, and once you start adding mutexes you easily risk your code becoming slower than if it was single-threaded if you are not careful.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.