Rustemo or LALRPOP?

I'm wondering if I could use a parser generator for my language case, specially in Rust.

Example snippets

<person></person>
/(?:)/
var default: Number = 10 /* ReservedWord as identifier */
object.case /* ReservedWord as identifier */

Example productions

Syntax

Destructuring
 Identifier [if keywords are enabled]
 IdentifierName [if keywords are disabled]
 ArrayDestructuring
 ObjectDestructuring
 Destructuring NonNull [lookahead ∉ NonNull]

The tokenizer scans one of the following input goal symbols depending on the syntactic context:

  • InputElementDiv
  • InputElementRegExp
  • InputElementXMLTag
  • InputElementXMLContent

Syntax

InputElementDiv
 WhiteSpace
 LineTerminator
 Comment
 Identifier
 ReservedWord
 Punctuator
 NumericLiteral
 StringLiteral
InputElementRegExp
 WhiteSpace
 LineTerminator
 Comment
 Identifier
 ReservedWord
 Punctuator
 /
 /=
 NumericLiteral
 StringLiteral
 RegularExpressionLiteral
 XMLMarkup
InputElementXMLTag
 XMLName
 XMLTagPunctuator
 XMLAttributeValue
 XMLWhitespace
 {
InputElementXMLContent
 XMLMarkup
 XMLText
 {
 < [lookahead ∉ { ?, !, / }]
 </

Syntax

PropertyIdentifier
 Identifier [when keywords are enabled]
 IdentifierName [when keywords are disabled]
 *
Qualifier
 PropertyIdentifier
SimpleQualifiedIdentifier
 PropertyIdentifier
 Qualifier :: PropertyIdentifier
 Qualifier :: Brackets
ExpressionQualifiedIdentifier
 ParenExpression :: PropertyIdentifier
 ParenExpression :: Brackets
NonAttributeQualifiedIdentifier
 SimpleQualifiedIdentifier
 ExpressionQualifiedIdentifier
QualifiedIdentifier
 @ Brackets
 @ NonAttributeQualifiedIdentifier
 NonAttributeQualifiedIdentifier

Syntax

PrimaryExpression
 QualifiedIdentifier

Syntax

PropertyOperator
 . QualifiedIdentifier
 Brackets

Full grammar

Consult the following links:

Question

Regarding one of the popular Rust parser generators, Rustemo and LALRPOP:

  • How would I choose the lexical input goal for a rule?
  • Is it fine to use an IdentifierName nonterminal freely to allow reserved words as identifiers in certain contexts? It happens that qualified identifiers are used both in lexical references and in the dot operator.

It's hard to give you helpful advice based on this post as it stands, because we would have to understand the entire (extensive!) syntax of the language that you're trying to parse—and even if someone were willing to work that hard to help you out, I don't think the links you've provided have enough information for them to do it successfully.

The kind of post that's most likely to get a prompt, useful response on URLO gets to the point quickly and includes as much as possible of the necessary non-shared context, so that we don't have to follow any links (play.rust-lang.org and docs.rs excepted) or digest a bunch of text to get up to speed. Here's a template that you can use to reframe this specific question in a more helpful way:

I'm looking into using either Rustemo or LALRPOP to parse a complex grammar for a programming language. [Here you could link to the full grammar, in case someone wants to dig deeper on your post.] I'm not sure how to parse some of the specific productions/nonterminals in the grammar (InputElementDiv, InputElementRegExp, InputElementXMLTag, InputElementXMLContent). Here is a reduced grammar with only a few productions that should help understand the problem.

[reduced grammar that includes the minimum necessary to understand the problem]

For this grammar, I'm not sure how to parse [specific productions] because [description of the problem]. I have already tried [some technique], but it didn't work because [reasons]. How can I solve this?

Feel free to edit your OP, or if you make a new one I'll close and unlist this thread. Good luck!

somewhat off topic

Implementing a featureful dynamic programming language is already an ambitious project; implementing one that other people will use for real applications is close to impossible. See Don't write a programming language. It's fine to have ambitious projects, but don't set yourself up to be crushed when they turn out the way ambitious projects usually do. And if there's some other problem, like building a game, that you're trying to advance with this, consider that implementing an entire programming language is incredibly unlikely to be on the path of least resistance leading to that goal. It's definitely harder than learning a different game engine or framework.

2 Likes

Right, I've updated the topic to be a bit more direct.

I'm building the language because unfortunately ActionScript 3 is not moving forward and I don't feel like using Haxe. In the other hand, Rust has no mature application development frameworks ready, so...

Of course I'm aware that building a framework such as Flex may not be easy though! But Flex is something I'm thinking of doing on top of the final compiler...

That's odd, I and many others have been developing applications in Rust for many years now. So far everything I have ever needed I have found.

I'm curious: What exactly is your definition of "Application Framework"? Are we talking about desktop, mobile or browser based applications? All three? Why does such an "Application Framework" need yet another language in it rather than Rust itself? Why not create the Flex framework thing (Whatever that is) for use from Rust itself?

I'm referring to the Display List capabilities which meet some criterias for me specifically... I've found issues when trying Pixi.js.

If you're really interested in my issue, see GitHub - hydroper/littlecarexperience-draft1

(I'm going to disengage after this post)

Your updated OP is better than the original, but it has a ways to go.

  • You're still expecting people to read the whole grammar for your language to understand what InputElementDiv etc. mean. Try to write your post assuming that nobody is willing to click through to hydroper-jet.github.io. For example, you could present an example short snippet of your language that illustrates this:

    • IdentifierName is expected where possible in many contexts instead of Identifier to allow reserved words as identifiers.

    along with the sequence of tokens that you want it to be parsed into, and ask how to achieve that with LALRPOP and/or Rustemo. Likewise for this:

    I'm wondering how I could indicate for a grammar nonterminal to use InputElementXMLContent as the lexical input goal for example?

    Assume that the reader has no idea what InputElementXMLContent is and give them enough information to help anyway, if they know LALROP/Rustemo.

  • In general, leave out the stuff that's extraneous to your actual question. Your post has references to AIR, as3_parser, and E4X, and these just distract the reader of your post who wants to help, knows enough about Rust/parsing/etc. to do so, but doesn't care about the details of your language design.

  • Best of all would be to start writing your parser in LALRPOP/Rustemo, keep going until you hit the problem(s) described in your post, then trim away everything that isn't related to the problem and add that non-working code to your post (inline if it's short, play.rust-lang.org link if it's runnable there, otherwise GitHub repo or gist) along with a failing test or just an example input that doesn't work. This is a great approach because making this kind of reproducer will force you to put all the relevant information about your problem into the form of Rust code—which everyone who helps out on this forum understands.

2 Likes

Done-ish.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.