Getting rid of recursive AST evaluation in a custom language

Hi all! I'm a beginner in Rust and coming from Python.

TLDR - DSL that filters arrays of jsons recursively evaluates the ast on EACH iteration. How to avoid it?
Rust Playground link

Long story follows. Warning! A lot of letters!

Lately I've been toying with the idea of making a little interpreter for DSL that is able to filter out an array of jsons ( serde_json::Value ). The criteria is known only at runtime (say, accepted via a network request or supplied by the user via stdin).

I kinda did it, but it is wildly inefficient due to the way it is implemented. I was hoping that more experienced peers might help me out.

So, I have a few "primitives" in my "language".

Path - which represents a json pointer
It is defined like so:

enum Path {
    Path(String),
}

impl Path {
    pub fn get_path(&self) -> &str {
        match self {
            Self::Path(x) => x,
        }
    }
}

Ops - which essentially represents or your usual comparison operators ( ==, >=, etc.)
This is the definition

enum Ops {
    Eq,
    Ne,
    Gt,
    Lt,
    Gte,
    Lte,
}

Expr - this is an enum which represent my AST
Definition follows

enum Expr {
    And(Box<Expr>, Box<Expr>),
    Or(Box<Expr>, Box<Expr>),
    Xor(Box<Expr>, Box<Expr>),
    Compare { lhs: Path, op: Ops, rhs: i64 },
    Exists { path: Path },
}

And, Or, Xor variants are pretty self explanatory,
Compare variant represents an operation where I take the value from my json which lies by pointer in lhs and compare it using Ops to some value.
Exists simply checks that value in Path exists in supplied json.

Compare and Exists are evaluated by using macroses, here they are

macro_rules! compare {
    ($json_identifier:ident, $path:expr, $operation:tt, $value_to_compare_with:expr) => {
        {
            let p_str = $path.get_path();
            let value_in_pointer_option =  $json_identifier.pointer(p_str);
            if let Some(inner) = value_in_pointer_option {
                inner.as_i64().unwrap() $operation *$value_to_compare_with
            } else {
                false
            }
        }
    };
}

macro_rules! exists {
    ($json_identifier:ident, $pointer_string:expr) => {
        $json_identifier.pointer($pointer_string).is_some()
    };
}

And finally the function that evaluates all of that is defined like that

fn ast_eval(ast: &Expr, v: &Value) -> bool {
    match ast {
        Expr::Compare { lhs, op, rhs } => match *op {
            Ops::Eq => compare!(v, lhs, ==, rhs),
            Ops::Gt => compare!(v, lhs, >, rhs),
            Ops::Lt => compare!(v, lhs, <, rhs),
            Ops::Gte => compare!(v, lhs, >=, rhs),
            Ops::Lte => compare!(v, lhs, <=, rhs),
            Ops::Ne => compare!(v, lhs, !=, rhs),
        },
        Expr::Exists { path } => exists!(v, path.get_path()),
        Expr::And(x, y) => ast_eval(x.as_ref(), v) & ast_eval(y.as_ref(), v),
        Expr::Or(x, y) => ast_eval(x.as_ref(), v) | ast_eval(y.as_ref(), v),
        Expr::Xor(x, y) => ast_eval(x.as_ref(), v) ^ ast_eval(y.as_ref(), v),
    }
}

Here is the problem. To find out if some json v satisfies condition in my ast, I have to recursively traverse it for EACH of v in the array.
I was thinking of making a macro out of ast_eval so it would "collect" all of the ifs that are generated by compare! and exists macroses, and use it in a .filter method as a closure or something along the lines. But, alas, I couldn't wrap my head around how I would approach it. Or maybe there is another way to "compile" the ast once and reuse it for each json. Thank you in advance for any advice or critique!

Recursively evaluating the AST seems pretty reasonable for something like this. If you aren't noticing performance issues when using the program, I don't think you necessarily need to do anything differently.

That being said, you've created a dynamic programming language and all of the usual optimization techniques for dynamic languages apply.

A simple optimization would be pre-evaluating any subexpression that don't depend on an input JSON value. So /key/inner > (5 * 2)[1] could be simplified to /key/inner > 10.

Most production interpreters use some sort of bytecode for actually running programs, which is compiled from the AST. Bytecode is generally easier to execute without a bunch of recursion, and also makes the interpreter easier to maintain and modify.

You could of course go all the way down the compilation route and compile into actual machine code too (often called "just in time compilation" or JIT) but that's probably waaaaay more complicated than what you need here.


  1. obviously your syntax is probably not identical to this ↩︎

Hi, thank you for the reply!

As for the pre-evaluating, it is being done, I've just left it out, since It wasn't the point.

Speaking performance - it is slow, very slow. Having to re-evaluate the AST again and again on each iteration is wildly inefficient and an obvious hot spot.

As any other developer, it bugs me when something that could be reused is not being reused :slight_smile:

I have tried to convert the ast_eval function to the recursive macro to "collect" all the conditions into one gigantic if, but failed.

As for bytecode idea, it seems promising, but a lot more learning is required :slight_smile: I'll get to it.

I wonder whether this is a use-case for memoization:

A Rust framework derived from code for implementing a programming language (Rust itself) with memoization is Salsa:

Although Salsa is

they're preparing to stabilize it:

Another approach to memoization (although I'm not sure memoization applies here) would be storing intermediate computations in a cache such as

1 Like

I wonder how much of a constant-factor optimization could be won by eliminating the pointer-chasing: move Expr's variants to Ops, iterate over a vector of Ops and values (Paths, i64s, bools), pop an op, pop as many arguments as it needs ... this is partly opposite to advice I gave before, but maybe it would speed up your program.

1 Like

Thank you very much for your reply!
Memoization was the first thing I've tried out. But I found it useless in this particular case, since the results computed for one JSON in the array are useless for the next one. There is no place to reuse computation results. Even more so, memoization actually made it noticeably slower, due to all the allocations and deallocations happening on each iteration. Thank you for telling me about Salsa, though!
I actually was able to find a way to "compile" my ast. Essentially move it from Expr enum into concrete structs, which call one another. But that required Boxes and dynamic dispatch. In the end, according to my benchmarks, I won exactly nothing =). At least it is not slower, eh?

Your idea with eliminating indirection seems very interesting, I'll give it a go.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.