Is there a better way to represent an abstract syntax tree?


#1

Hello, I’m working through Crafting Interpreters and I am reimplementing the interpreter described in the book in Rust.

I got to the point where I need to define the type for an abstract syntax tree and the compiler is (rightfully) telling me that my original attempt was creating a recursive type with infinite size.

The error message has been helpful enough suggesting to use Box to prevent the issues, and in fact the following enum compiler just fine.

pub enum Expr{
  Unary(Operator, Box<Expr>),
  Binary(Box<Expr>, Operator, Box<Expr>),
  Grouping(Box<Expr>),
  Literal(Literal)
}

I am wondering if that’s the best representation I can get to or if there is something I am missing to implement it in a better way.


#2

That’s basically how rustc’s own internals look. Depending on your use cases when working with the AST you might want to swap in reference counted types rather than Box, but that’s basically it.


#3

Great! Thanks for your reply!


#4

Here are a couple of other options:

Instead of Binary(Box<Expr>, Operator, Box<Expr>) you can do something like

enum Expr {
    ...
    Binary(Box<BinaryExpr>)
}

struct BinaryExpr {
    op: Operator
    lhs: Expr, rhs: Expr,
}

This allows you to implement specific methods for BinaryExprs, and also allows to keep the size of the Expr itself small. I’ve used this representation here: https://github.com/matklad/miniml/blob/master/ast/src/exprs.rs

Another representation, which is used in the context of IDEs, when you want to have a lossless representation of the source code, as opposed to the abstract syntax tree, is a two-layer one.

On the first layer you have an untyped syntax tree, which basically marks ranges in the source text:

struct NodeType(u32);

struct Node {
  type: NodeType,  
  text_range: (usize, usize),
  children: Vec<Node>
}

on top of that, you have a typed representation, which delegates to the syntax tree

struct BinaryExpression {
    node: Node
}

impl BinaryExpression {
    fn lhs(&self) -> Expression { Expression { node:  self.children[0] } }
}

#5

I like this idea. I’ll give it a try. Thanks!