Help with compiler architecture


#1

I am trying to build a toy-compiler with the Rust programming language and am currently struggeling with the Rust’s way of doing things.

I already planned out the compiler (hopefully detailed enough so that you guys can follow) and it can be found at:
GitHub Gist Link

It includes everything of the first few parsing components up to the Parser generation of the abstract syntax tree which should be enough as a start.
Rationals about why I want to design the things the way I stated are also included and hopefully reasonable.

My main concerns are about the way I try to use the Source class as the owner of information at which other entities (such as Tokens) are pointing to.

It would be nice if you guys could validate my design approach for implementation in the Rust programming language or tell me what will certainly fail and what I could do otherwise. :slight_smile:

Besides that I am always welcoming general improvements to the current design!

Thanks in advance to anybody taking his or her time to read through all this and finally help me! :smiley:

Regards,
Rob


#2

I haven’t written a compiler in Rust (yet? :slight_smile: ) so I can give only rough ideas which may be totally off base.

  • I think it is not a good idea to add lots of Rcs and RefCell upfront. I would try to go as far as possible with strict ownership, and adding Rcs and RefCell on as needed basis.

  • I am a bit worried that each Token holds an Rc to source. That’s a lot of Rcs, and tokens are not Copy. I would do something like this:

struct TokenizedSource {
    source: Source,
    tokens: Vec<Token>,
}

type Source = String;

#[derive(Clone, Copy, Eq)]
struct Token(usize, usize, Kind)

#3

Thank you for your comment!

I think that you are right and I shouldn’t add Source (or even a Rc) to every Token.
It is useful when the Tokens are Copy but I wouldn’t abstract away the SourceRange with two independent usizes since they are a useful only together as a pair of values.
Later Token properties like the SourceRange will end up in certain parts of the AST.
Maybe I can find a tree-structure that enables me to only safe a link to the Source at some specific Nodes in the AST and all Child-Nodes (and children of them) can then use this Source recursively - I have to reiterate on that.

Besides that you are also right that I shouldn’t use too many RefCells and all but at the moment I simply can not find another architecture for example the CompileContext and its use case.
All in all I need to try out this design and maybe I will repost here with another iteration when it is done.