Simple language peg/pest can parse that lalrpop can not?

This is a followup to Simple grammar peg/pest can parse but lalrpop can not?

Here, I am looking for a Language peg/pest can recognize but lalrpop can not.

Note, this is different from a grammar that lalrpop can not handle.

In particular, a correct solution needs to define a language L where:

  1. there is a peg/pest grammar that recognizes L
  2. all grammars for lalrpop fails to recognize L

(i.e. providing a grammar that recognizes L but lalrpop can't handle is not sufficient; as there might be a different grammar that recognizes L that lalrpop does handle).

It is also possible to build LL parsers and LR parsers from parsing expression grammars, with better worst-case performance than a recursive descent parser, but the unlimited lookahead capability of the grammar formalism is then lost. Therefore, not all languages that can be expressed using parsing expression grammars can be parsed by LL or LR parsers.

~ wikipedia

The way I interpret that is that the only way for a language to satisfy your criteria is if its PEG grammar uses predicates, so let's play with that and see where it gets us. Let L1 and L2 be CF languages. A PEG grammar that has the rule S -> &(L1 eof) L2 eof is equivalent to (L1 ∩ L2) eof, which is not necessarily CF because the set of CF languages is not closed under intersection. LALR⊆LR⊆CF, so that means lalrpop wouldn't be able to parse it.

The language you're looking for is {a^n b^n c^n eof | n∈N}. The PEG grammar probably looks something like this:

S -> &(X c* eof) a* Y eof
X -> a X b | epsilon
Y -> b Y c | epsilon

The big long explanation is as much there for me as for you, I had to learn a fair bit about PEGs to figure it out. Your previous question was easier :slight_smile:

4 Likes

Is there a typo?

Quoting: Pumping lemma for context-free languages - Wikipedia

For example, the language {\displaystyle L={a^{n}b^{n}c^{n}|n>0}}{isplaystyle L=a^{n}b^{n}c^{n}|n>0} can be shown to be non-context-free by using the pumping lemma in a proof by contradiction.

No typo, that's actually the point. PEG grammars can accept non-context-free languages, because context-free lookaheads that don't consume input can be used to implement language intersections.

1 Like

You're right. I did not realize (until now) that PEG can recognize non-context-free languages.

This paper lists some other examples, such as

  • Unary languages consisting of all words whose length is a power of 2
  • Unary languages consisting of all words whose length is a power of any fixed c > 0
  • Languages consisting of palindromes whose length is a power of 2

However it's also conjectured that

  • Languages consisting of palindromes of even length are not in PEG

(That's a context-free language, but it's not recognizable by LALR.)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.