Simple grammar peg/pest can parse but lalrpop can not?

What is a simple grammar that peg/pest can parse but lalrpop can not ?

As I understand it, the difference is that a PEG grammar can be ambiguous, and resolves its ambiguity by having a precedence of which rules will be applied. In comparison, LR grammars (LALR ⊂ LR) must be completely unambiguous or the algorithm will fail to generate parse tables for it. Based on that, I'd guess a grammar like this would meet your criteria:

S →a a | a | S a

The reasoning would be that on input aa, a PEG parser would use the first production, but an LR parser would catch the ambiguity in the compilation phase and fail.

Disclaimer: I'm confident about the part of my answer relating to LR/LALR grammars, but I'm less sure about PEG grammars. I've also never used any of the libraries in question, so there may be additional details beyond the theory that I'm not aware of.

1 Like

In my limited understanding:

S = { a, aa, aaa, aaaa, ... } = a+ right? a+ can be recognized by a regex, so it can definitely be recognized by lalrpop.

I think you're getting confused between languages and grammars. The set {a, aa, aaa...} is a regular language, so it could be parsed by anything that's at least as powerful as a finite state machine. There are many ways of writing a grammar for that language, and the grammar I used as an example does it in an ambiguous way. An unambiguous grammar for that language could be written very easily, and then lalrpop would accept it, but it'll be rejected if you try to use the exact grammar I gave.

Maybe you meant to ask for a language that peg can parse but lalrpop can't?

3 Likes

I just looked at the title of my post. I asked for a "grammar" lalrpop can not handle. I did NOT ask for a "langauge" lalrpop can not handle, so, technically, you're right.

There are languages that can be described by a PEG grammar which are impossible to express in BNF. The classic example is { anbncn | n ∈ ℕ }. As far as I know, the converse is still an open reserarch question: There are no known languages that can be expressed in BNF but not PEG, but neither is there a proof that one can't exist.

That's a fair description of how PEG works in practice, but it's a little misleading: PEG theory is built from a different foundation than traditional parser theory, and the concept of grammar ambiguity doesn't map onto it cleanly. When considering a language L, there are two main questions to consider:

  1. How can you generate all of the strings S that are members of L?
  2. Given a string S, how can you decide whether or not it is a member of L?

A BNF grammar defines a language in terms of the first question: It specifies a set of production rules, and any string you create by repeatedly applying those rules is, by definition, a member of the language. A parser answers the second question by trying to find a sequence of rule applications that produces S. The grammar is ambiguous if there are any strings in L that can be produced in multiple ways from the given rules.

PEG instead defines a language in terms of the second question directly: It describes a procedure to determine whether or not S is in the language L. As there is no search involved here that might produce multiple answers, the traditional idea of ambiguity doesn't really apply. The cost of this approach is that it's not easy to generate arbitrary strings that will be accepted by a given PEG grammar— If there's any interesting concept of ambiguity for PEG grammars, it will show up in algorithms that try to answer question (1) instead of question (2).

6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.