So I have been working on my compiler for my compilers class. We have just gotten done with name analysis.
The way I have implemented the SymbolTable is using a list of hashmaps. Each hashmap represents a scope. When I enter a scope in the AST, I append a new hashmap to the list. When I encounter a declaration, I put a new entry in the current scope hashmap, with the key being the ID, and the value being a Symbol struct which contains the name and type given to that declaration. When I encounter the use of an ID, I look it up in the current scope's symbol table, and if it's not there, look in the second to the last scope in the list, etc. all the way until the first hashmap in the list, the global scope. If I find the symbol, I give the ID node in the AST a pointer to it. Finally, when I exit a scope, I remove the last hashmap from the list but I don't delete all of the Symbols that it contained, because every ID node in the AST which referred to a Symbol will still need to have access to that Symbol.
Currently, my symbol table and my ID nodes share ownership of a Symbol struct via Rc
. But I was reading the comments on this reddit post and now I'm wondering, is this good? My reasoning for using Rc
in the first place is that obviously the hashmap can't be the single owner of the Symbol it contains, because I may need the Symbol to persist after the hashmap gets popped. Also, it feels kind of natural to use multiple ownership in this case, since I may have multiple ID nodes that will need to reference the same Symbol; that Symbol can only be freed when there are no ID nodes referencing it; and I have no idea how many (if any) IDs are going to reference a symbol until all of the AST nodes belonging to a scope are actually visited.
One alternative implementation I could think of is to have the ID node associated with the declaration be the single owner of a Symbol, and the ID nodes associated with uses be borrowers of that Symbol. I think this would be valid since I don't ever plan on dropping any of my AST nodes unless I'm dropping the entire tree. However, I'm pretty sure this would require me to make some enum like:
enum OwnedOrBorrowed<'a> {
Owned(Symbol),
Borrowed(&'a Symbol),
}
For reference, here is the repo. The relevant files are src/name/symbol.rs and src/ast.rs.