Lalrpop regex precedence problem


#1

I’m trying to build a simple lalrpop parser for a simple expr language that skips the contents of comments (if it sees “/*” it matches everything until “’*/”).

I’m probably doing something stupid, but when I try to use lalrpop’s regex precedence method, I’m getting weird results. Simple example:

use std::str::FromStr;
grammar;

pub Term = { Num, "(" <Term> ")" };
Num: i32 = r"[0-9]+" => i32::from_str(<>).unwrap();

match { 
	"(",
	")",
	r"[0-9]+",
} else {
	r"[a0-9].+" 
}

here there’s two (very) overlapping regexs, which I am trying to prioritize. However, even though the lower-ranked regex r"[a0-9].+" is never used, it seems to derail matches that succeed when I don’t have it:

	println!("result={:?}", TermParser::new().parse("(22)").unwrap());
	assert!(TermParser::new().parse("(22)").is_ok());

gives a panic with the output:
result=Err(UnrecognizedToken { token: Some((1, Token(0, "22)"), 4)), expected: ["\"(\"", "r#\"[0-9]+\"#"] })

Any insight into the right way to do this? Or if there’s a better approach?

Thanks!


#2

Hey

You can also join our gitter channel and post your question there for a faster way of getting an answer.


#3

In lalrpop, lexing and parsing are separate phases. That is, the input is first lexed using all regexes from the grammar and the match section, and only then parsed, so, for lexing, it doesn’t matter if you actually use a regex.