Declarative pattern matching on trees

Please ignore utf for this question.

We can say that a regex operates on a LIST of U8's.
I am looking for XYZ that operates on a TREE of NODEs.

In the regex case, we can describe a PATTERN that matches certain LISTs of U8s, and we can do: capture, replace.

I am looking for an XYZ that can do match/capture/replace on SUBTREEs of NODEs.

Here's the problem: suppose we are working on a small lisp interpreter. We can have some representation:

pub enum Sexp {
  Int(...),
  String(...),
  Symbol(...),
  Apply(Vec<Sexp>),
}

Then we might ask questions of the form: Does this Sexp match (if $tst $then_body)
or (if $tst $then_body $else_body) or (defn $symbol [ ..$args.. ] $body)

And I want to be able to express these 3 (if_then, if_then_else, defn) as "patterns" in some XYZ language.

Is there any such XYZ ?

Basically I want to do pattern matching over Sexps in a DECLARATIVE way (by describing the pattern) rather than manually writing procedural matches. (Much like regex allows us to specfy classes of list of u8s in a DECLARATIVE way rather than a PROCEDURAL way).

Edit: In mathematics / CS theory, what would Rust's macro_rules be referred to? (Yes, I know it's declarative macros; but what type of mathematical object / cs theory object / algorithmic object does it match up with ?)

You may find this interesting: The Stanford Natural Language Processing Group

I don't know of any rust package that implements something like this, though.

2 Likes

I am curious what you were searching for at the time / what problem you were trying to solve at the time that led you to finding this set of tools.

Term rewriting systems

6 Likes

As a first approximation, a declarative macro is a function M: L -> R where L and R are formal languages – in particular context-free languages over a shared alphabet Σ. In this particular case, Σ is the set of valid Rust tokens, R is the language specified by Rust's grammar (ie. the language accepted by Rust's parser), and L is whatever set of strings (sequences of tokens) the macro in question can accept.

More accurately, declarative macros in Rust can neither accept arbitrary context-free languages over Σ nor output arbitrary Rust ASTs due to identifier hygiene and the requirement that all brackets match. On the other hand they are able to accept many languages that are not context-free because matchers like ty involve semantic in addition to syntactic information. Thus, declarative macros are also able to accept (a subset) of context-sensitive languages.

Of course, M cannot be an arbitrary computable function (that's what procedural macros are for) given that they must be defined as a list of term-rewriting rules like @jeanas mentioned.

1 Like