Recommended inline parser crate with AST support as of 2019

Hi.

I'm looking to begin a project where I need to implement a query language. It's essentially a type of database, so the program needs to accept a string input, parse it to a Abstract Syntax Tree (AST) in object form which I can then evaluate against some underlying data structures and produce a result.

I've been using boost's spirit QI before in C++, and while I liked it being an inline parser and liked the feel of it I don't think I've ever wrestled so much with obscure errors from a single library before in my life.

Searching the forums and online it seems most parser crates for Rust seem to be for different purposes, e.g. just parsing a file or a stream, and not producing a full AST of the result which can be passed on to the query handler. Might just be my lacking google fu of course.

Thanks in advance for any recommendations!

Best Regards
Dan True

I think most promissing for your casse is the nom crate. While initially developed to parse byte formats, it has complete support for str and with the new version 5 it accessability increased drastically.
What you will have to do is to write your own functions that take input (&str) and return the objects of your AST. When you are experienced in such parser builing I think you know what to do.

Some minor advice: I built such a parser myself and the most ergonomic way to handle an AST and its different objects/classes is, in my opinion, by writting for each class an own struct/enum and implement a single AstClass trait for all of them. Then write an enum Ast with "new-type-variants" for each of your AstClasses, e.g.:

trait AstClass {
  fn execute(&self) {
    println!("I executed");
  }
}

struct Expression;
impl AstClass for Expression {}

struct Declaration;
impl AstClass for Declaration {}

enum Ast {
  Expression(Expression),
  Declaration(Declaration),
}

impl AstClass for Ast {
  fn execute(&self) {
    match self {
      Ast::Expression(inner) => inner.execute(),
      Ast::Declaration(inner) => inner.execute(),
    }
  }
}
2 Likes

Thanks. I've heard of nom, but it seemed very focussed on files. Will give it a go.

Yeah, I get what you mean. Except there'll be a lot of expressions within expressions, and that will become a mess if all my AST-constructors can only take &str without having to make my AST-constructors essentially become parser combinators.

For instance, if I want to parse: "A & B" into an ConjunctionExpression AST object. The function which constructs the ConjunctionExpression would then have to call the parser for both sub expressions (because both A and B above will often not be atomic strings, but rather complex expressions themselves, e.g. disjunctions).

This sounds like I'll have parsing calls scattered throught my AST-construction functions. Doable, but not very pretty. Or do I misunderstansd you or how nom works?

Best Regards
Dan True

Yes, your right. For me the capability of nom doing parsing and (context aware) validity checks within one step, outputting directly the required types, is what makes me love it. However, as you correctly expected this can become messy by times and you have to be aware of recursion and which parser to call first.

If you more like the "traditional way" where you write some syntax and a program generates all the code for, e.g. yacc and bison, you maybe find pest better suited for you. Pest applies syntax rules to its input, afterwards you can check which rules applied and eventually build the AST from it. However, these are two stepps why pest is usually slower than nom (there was some dispute between the two crates about this topic, sadly I was not able to find the post of nom's author anymore).

1 Like

I can see that noms landing page specifically mentions constructing ASTs for programming languages and gives some examples. I'll have a detailed look at those :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.