I've an AS3 tokenizer written in Rust and have considered using LALRPOP for parsing the AS3 language with a bunch of new compatible features to add.
I've implemented a parser for a similiar language several times using a handwritten approach, similiar to jQuery's Esprima and luaparse. Due to destructuring patterns and arrow functions intersecting in syntax, handwritting is very bug-prone, so I was wishing I could just use a parser generator.
My questions:
XML input goals
Since AS3 inherits the E4X standard (ECMAScript for XML), it has more lexical input goals:
- InputElementXMLTag
- InputElementXMLContent
I've different methods for scanning already: scan_ie_div
, scan_ie_xml_tag
, and scan_ie_xml_content
, as well as one for regular expressions to be used ocasionally.
For compliance with the standard, I want the XMLWhitespace nonterminal to be treated as a token.
Will LALRPOP interpret my whitespace correctly?
Diagnostics and line numbers
- Should I continue to map line numbers to line offsets inside a
Vec<usize>
. Does LALRPOP gather the line numbers already when printing diagnostics? - I also built my source locations (
Location
) such that they additionally contained line numbers. Necessary when using LALRPOP? - I built a
Diagnostic
structure, which is added to a vector for every respectiveSource
. Is this compatible with LALRPOP?