Rust JSONLogic Expression Evaluation Slower Than JS – Looking for Code Review & Optimization Tips

I’ve implemented a JSONLogic Expression Evaluator in Rust (datalogic-rs - GitHub - json-logic/datalogic-rs: A fast, type-safe Rust implementation of JSONLogic for evaluating logical rules as JSON. Perfect for business rules engines and dynamic filtering in Rust applications.), but I’m noticing that it’s significantly slower than its JavaScript counterpart (json-logic-engine).

Here are my benchmarks for evaluating a complex expression:

• Rust (datalogic-rs): 805ms

• JavaScript (json-logic-engine): 395ms

Both implementations support the same features and desugaring techniques. Initially, I experimented with stack-based iteration, but it involved too many stack operations, making it difficult to optimize further. Recursion ended up being the better-performing approach in my Rust implementation, but it’s still lagging behind JS.

I’ve been programming for 25 years but only picked up Rust a year ago—and I absolutely love it! That said, I’d really appreciate any insights from more experienced Rust devs on how I can improve performance.

Here’s what I’m looking for:

• Are there any common pitfalls in my implementation that could be slowing it down?

• Would a different approach (e.g., memoization, arena allocation, or better data structures) improve things?

• Any Rust-specific optimizations that I might be missing?

Repo: GitHub - json-logic/datalogic-rs: A fast, type-safe Rust implementation of JSONLogic for evaluating logical rules as JSON. Perfect for business rules engines and dynamic filtering in Rust applications.

Profiling using Instruments shows attached report.

Would love to hear your thoughts and suggestions! :rocket:

From a quick look, I believe the main source of performance of the JS version is this line:

The implementation builds a JavaScript function string from the input that implements the defined logic then evaluates that so the JS runtime gets to JIT the implementation to native code where it matters.

To beat JS here from a cold start is a very tall order, it's essentially the best case for what they're optimized for; on the fly JIT what needs to be exactly as much as it needs to be and no more.

You can probably win for either sufficiently small or known ahead of time inputs by building native instructions yourself, even some very naive instructions can give orders of magnitude improvements, but of course this is not for the faint of heart.


A quick search finds Dynasm — Rust proc macro helper // Lib.rs which at least handles some of the scarier bits, it looks like?

Thanks for the details. I understand that JS does some magic when executing the functions, and my intension is not to beat JS here, but to see if I have any optimizations that I can use in Rust to make my implementation better.

Well AMD uProf is blue screening me if I try to collect stack traces, but the regular perf analysis does show pretty much what you'd expect, most of the time in the operation apply calls depending on data shape. The normal low hanging fruit of improving data locality doesn't seem to be too much of an issue according to the sampling I'm seeing on that repo's examples (though maybe that's just the X3D cache doing it's thing?)

I think to see anything meaningful you're going to have to JIT at some level to get the CPU to see and predict what you're trying to do, at the moment you've got a pretty bad case of 90% of your instructions being either performing a call or branching, drowning out the actual logic.

You could look further into the runtimes available, for example there seems to be some good WASM runtime options that will do the JIT for you, and that's a lot easier to emit than x86.

Other than that threading should be pretty trivial for the use case of running the same logic on a million inputs in parallel, but I'm sure you're already aware.

1 Like

Thanks for the effort. After a week effort, I was able to beat the JS benchmarks by 30%. Below is the list of changes I've made to the expression evaluation library

  1. Reimplemented the entire library using Bump Allocation Arena (Bumpalo)
  2. No memory allocation during logic evaluation
  3. Memory locality optimizations

Benchmark Statistics:
Datalogic-rs (mine): 275 ms
json-logic-engine: 392 ms
json-logic-js: 5645 ms

Overall, I had a great learning. You can see the progress in the Github branch: GitHub - codetiger/datalogic-rs at v3.0-dev

1 Like

Interesting! I must need to get better at using my profiler, it didn't suggest locality being an issue (cache expiry counts) nor time in allocators!

My guess is I simply was badly benchmarking, though it's possible I was misreading the counters (there's a lot of them)

1 Like

What profiler shows that level of info? Just curious. Am using instruments in Mac and only focusing on time profiler.

AMD uProf because I have an AMD CPU, Intel have VTune (?) for theirs. They're rather confusing, and sensitive to OS version apparently (I think my update to 24H2 caused my blue screening.)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.