Cache friendly tree expr

Suppose we have something like this:

enum Expr {
  F32(f32),
  I32(i32),
  Var(String),
  Add(Box<Expr>, Box<Expr>),
  Sub(Box<Expr, Box<Expr>)
}

then this structure can be all over the heap, where each node provides a tiny amount of information and sends us to another random location on the heap.

Is there a way to represent these "recursive expression trees" in a cache friendly structure?

(If you believe this is premature optimization, I'm open to that too, but please provide a detailed argument instead of just stating it.)

Yes. Make a vector of expressions and store indices rather than boxes.

1 Like

Is this a commonly used in practice? It seems to lose much of the "purely functional" feel and involve lots of plumbing.

What do you want to do with it a lot?
It can be optimized for lots of random modifications, evaluation, matching ...

A kind of bump allocator could work, as well. Do you need to delete individual expressions or only the whole tree at once? For the latter case, bumpalo seems like a good choice.

  1. type inference
  2. pattern matching / rewriting (like Mathematica's 'kernel' language, vpri's ometa)

Then I would try to

  • deduplicate nodes
  • use a global or thread-local arena (via u32 indices)
enum ExprData {
    Float(f64),
    Int(i64),
    Var(NameIdx),
    Add(ExprIdx, ExprIdx),
    Sub(ExprIdx, ExprIdx),
}

The goal being to keep it as small as possible.
The two indices of Add/Sub mean that you can use 64bit data without space penalty.

De-duplication means faster equality checks without having to walk both trees.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.