Why would someone use Rowan?

I'm writing a small calculator app and, despite my best efforts, I just can't see why someone would choose Rowan over their own internal AST representation? It seems to me that it just makes code more verbose and confusing.

For something small: you probably wouldn't.

For an IDE-strength parser, having a uniform lossless syntax tree is extremely valuable. The keyword you want to search for is "lossless syntax tree".

Note also that generating the typed view can be automated; you can look at rust-analyzer's use of ungrammar for this.

2 Likes

Some useful references:

https://www.oilshell.org/blog/2017/02/11.html

5 Likes

Dumb question: I'm looking at rowan - Rust . What is a "lossless syntax tree" (aren't all syntax trees lossless) ? What is a "lossy syntax tree" in this context ?

Back to the original question: I think the standard way to handle a calculator app would be:

  • parse to postfix notation (use two stacks) OR

  • pratt/precedence parsing OR

  • packrat parsing

From Syntax in rust-analyzer:

  • Syntax trees are lossless, or full fidelity. All comments and whitespace get preserved.

Many ASTs omit whitespace and other information not relevant to semantics, which would be "lossy" under this definition.

3 Likes

I see, so: "lossless" = you can re-derive byte-equiv foo.rs from the ASTtree ?

Yes, I believe that is the intent.

Exactly! This allows you to parse a lossless syntax tree, perform a refactoring and then write out a file with the refactoring applied where only the parts that are actually affected by the refactoring are changed and thus without any spurious whitespace or comment changes.

5 Likes

This sounds like the requirement is stronger than "lossless", but "lipschitz continuous and local" for source code in that: (1) small change in AST -> small change in output source, and (2) local change in AST -> local change in output source.

Do you have any example of calculator using rowan?

There is examples/math.rs, distributed with rowan, not sure if it qualifies as a full calculator but close enough.

For the curious, I helped design rowan's initial tree representation (pseudo-in-place modification support was done after me) and have a highly experimental fork/reimplementation/alternative called sorbus on the backburner. Somewhere in the distant future I hope to take the roslyn/rowan/sorbus LST infrastructure plus matklad's thought on parser generators to chain together existing and some novel work to provide the "boilerplate" infrastructure for working with a uniformly typed LST in an off-the-shelf declarative fashion.

That's a ways off, though, because most individual projects would be better off for the time being using direct-to-AST parsing (for simplicity, ignoring style-preserving modification as out of scope), tree-sitter (for a parser generator with many of the desirable properties of a homogeneous error-tolerant syntax tree), or an adhoc specialized code generator using ungrammar and rowan like r-a uses.

Rationalizing building out mpg means having more than one consumer that aren't satisfiable with tree-sitter. Because it's a big time investment to make the specialized support code generation that r-a is using for rowan more general-purpose.

(If you're interested in collaborating on mpg design/implementation/using, dm me here or elsewhere; I'll set up a repo to track it if I get some outside interest. The key representational experiment I want to try for a new version of sorbus is entirely flattening the tree into a linear buffer. Rowan's got an open experiment issue for doing this for "small enough" branches.)

And on terminology: it's hard, but the terms of art I see in use are

  • Trivia โ€” tokens which exist in the lexical syntax but which have no impact on the AST (e.g. whitespace, comments)
  • Lossless Syntax Tree (LST) โ€” a syntax tree which captures the lexical syntax with enough fidelity to serialize back to the exact input syntax, including trivia and invalid syntax
  • Lexical Syntax Tree (LST) โ€” not a properly lossless LST, but instead just a CST using LST representation techniques (avoid using this term)
  • Concrete Syntax Tree (CST) โ€” also Parse Tree โ€” a syntax tree which represents the lexical syntax but which contains lexical specifics rather than just abstract semantic information (nb: this is what syn provides); typically only really used to refer to a direct tree representation of a purely declarative parser's parse tree
  • Abstract Syntax Tree (AST) โ€” a syntax tree referencing just the fully abstract syntax tree, though the line between AST and CST is blurry; the main difference is more in how convenient they are to work with :slight_smile:
2 Likes