Is there a better way to represent an abstract syntax tree?

mariosangiorgio · February 19, 2017, 11:30pm

Hello, I'm working through Crafting Interpreters and I am reimplementing the interpreter described in the book in Rust.

I got to the point where I need to define the type for an abstract syntax tree and the compiler is (rightfully) telling me that my original attempt was creating a recursive type with infinite size.

The error message has been helpful enough suggesting to use Box to prevent the issues, and in fact the following enum compiler just fine.

pub enum Expr{
  Unary(Operator, Box<Expr>),
  Binary(Box<Expr>, Operator, Box<Expr>),
  Grouping(Box<Expr>),
  Literal(Literal)
}

I am wondering if that's the best representation I can get to or if there is something I am missing to implement it in a better way.

sfackler · February 20, 2017, 12:11am

That's basically how rustc's own internals look. Depending on your use cases when working with the AST you might want to swap in reference counted types rather than Box, but that's basically it.

mariosangiorgio · February 20, 2017, 12:29am

Great! Thanks for your reply!

matklad · February 20, 2017, 8:58am

Here are a couple of other options:

Instead of Binary(Box<Expr>, Operator, Box<Expr>) you can do something like

enum Expr {
    ...
    Binary(Box<BinaryExpr>)
}

struct BinaryExpr {
    op: Operator
    lhs: Expr, rhs: Expr,
}

This allows you to implement specific methods for BinaryExprs, and also allows to keep the size of the Expr itself small. I've used this representation here: https://github.com/matklad/miniml/blob/master/ast/src/exprs.rs

Another representation, which is used in the context of IDEs, when you want to have a lossless representation of the source code, as opposed to the abstract syntax tree, is a two-layer one.

On the first layer you have an untyped syntax tree, which basically marks ranges in the source text:

struct NodeType(u32);

struct Node {
  type: NodeType,  
  text_range: (usize, usize),
  children: Vec<Node>
}

on top of that, you have a typed representation, which delegates to the syntax tree

struct BinaryExpression {
    node: Node
}

impl BinaryExpression {
    fn lhs(&self) -> Expression { Expression { node:  self.children[0] } }
}

mariosangiorgio · February 21, 2017, 7:16am

I like this idea. I'll give it a try. Thanks!

zyansheep · April 18, 2022, 6:24pm

I'm 5 years late, but for posterity, another way is to use Arenas (like bumpalo) or allocate directly on stack frames using a layout like this:

enum Expr<'a>{
    Unary(Operator, &'a Expr<'a>),
    Binary(&'a Expr<'a>, Operator, &'a Expr<'a>)
    Literal(Literal)
}
// Arena Allocation (pass arena around to use these expressions)
let arena = bumpalo::Bump::new();
let literal: &mut Expr = arena.alloc(Literal("true".into()));
let negation_expr = arena.alloc(Expr::Unary(Operation::Negate, literal));

// Stack Allocation (expressions are dropped at end of stack frame, so recursion is needed to use these effectively)
let stack_literal = &Literal("false".into());
let stack_expr = Expr::Unary(Operation::Negate, literal);

Just make sure to pass around the arena to all your functions that need to allocate and keep in mind that nothing is deallocated until you drop the arena.
Using this strategy is really useful if you are doing many, many, operations where you don't need to deallocate a bunch of stuff. It saves so much time in heap allocation because it preallocates the heap. In fact, the rust compiler uses arenas in many cases Memory Management in Rustc - Guide to Rustc Development. The only downside is that you need to annotate everything with lifetime parameters which can be annoying.

quinedot · April 18, 2022, 8:13pm

Thank you for writing up that reply, but FYI we don't typically revive old threads on this forum (e.g. OP hasn't been active for 3+ years).

(I'll close this thread.)

Topic		Replies	Views
Recursive Enum Types	5	18786	July 3, 2022
Best way to represent AST? help	13	5231	January 8, 2024
Tying knots in the type system	3	979	January 12, 2023
BenchmarksGame binarytrees	11	1542	January 12, 2023
Mutually Recursive Types help	3	2323	January 12, 2023

Is there a better way to represent an abstract syntax tree?

Related topics