Presenting gramatica, a bison-style compiler compiler for Rust


#1

Hello rustaceans!

I have made a new crate called gramatica ( https://crates.io/crates/gramatica ). Hopefully, this time is an useful one.

It has been done with the spirit of bison, but with a much more comfortable syntax, which is integrated naturally into the Rust language.

As example of use, please look as this grammar of a simple calculator:

extern crate gramatica;
use std::cmp::Ordering;
use std::io::BufRead;
use gramatica::{Associativity,EarleyKind,State,Parser,ParsingTablesTrait,AmbiguityInfo};

re_terminal!(Num(f64),"[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?");
re_terminal!(Plus,"\\+");
re_terminal!(Minus,"-");
re_terminal!(Star,"\\*");
re_terminal!(Slash,"/");
re_terminal!(Caret,"\\^");
re_terminal!(LPar,"\\(");
re_terminal!(RPar,"\\)");
re_terminal!(NewLine,"\\n");
re_terminal!(_,"\\s+");//Otherwise skip spaces

nonterminal Input(())
{
	() => (),
	(Input,Line) => (),
}

nonterminal Line(())
{
	(NewLine) => (),
	(Expression(value), NewLine) =>
	{
		println!("{}",value);
	},
}

nonterminal Expression(f64)
{
	(Num(value)) => value,
	#[priority(addition)]
	#[associativity(left)]
	(Expression(l),Plus,Expression(r)) => l+r,
	#[priority(addition)]
	#[associativity(left)]
	(Expression(l),Minus,Expression(r)) => l-r,
	#[priority(multiplication)]
	#[associativity(left)]
	(Expression(l),Star,Expression(r)) => l*r,
	#[priority(multiplication)]
	#[associativity(left)]
	(Expression(l),Slash,Expression(r)) => l/r,
	#[priority(addition)]
	#[associativity(left)]
	(Minus,Expression(value)) => -value,
	#[priority(exponentiation)]
	#[associativity(right)]
	(Expression(l),Caret,Expression(r)) => l.powf(r),
	(LPar,Expression(value),RPar) => value,
}

ordering!(exponentiation,multiplication,addition);

fn main()
{
	let stdin=std::io::stdin();
	for rline in stdin.lock().lines()
	{
		let line=rline.unwrap()+"\n";
		println!("line={}",line);
		match Parser::<Token,ParsingTables>::parse(&line,None)
		{
			Err(x) => println!("error parsing: {:?}",x),
			Ok(x) => println!("parsed correctly: {:?}",x),
		};
	}
}

Then use the binary gramatica_compiler to convert the grammar file into a proper Rust file containing ‘tables’ for the parsing.

The parser implements Earley’s algorithm. Thus, it is able to parse any context-free language. Or even further, doing some tokenization tricks. For example, a typical problem is interpreting >> as arithmetical shift or as closing two generics. I have worked that as being it parsed as three tokens: >, NoSpace, and another >. Then some rules include a MaybeSpace token and other ones require the NoSpace.

As another consequence of using Earley’s algorithm, if the grammar is unambiguous then there are not conflicts, contrasting thus with LALR and similar parsers. With ambiguous grammars one must annotate in the conflicting rules how to proceed. One problem is that checking ambiguity of context-free grammars is undecidable. Hence the problem can only be detected at runtime.

One can embed Rust code in the grammar. For this I have needed to implement Rust’s grammar, which I have found at https://github.com/rust-lang/rust/blob/master/src/grammar/parser-lalr.y . Is that grammar up to date or I am missing new features?

Also note that even if I am parsing all of Rust’s syntax, I am not processing yet all of it. Thus, advanced notation in the source grammar may disappear (or appear as fixme) in the generated Rust file. Nevertheless, a lot of syntax is implemented, since it was required to complete its bootstrapping. Additionally, the code still needs to include annotations for disambiguations. While for the bootstrapping I have just added parentheses, it is a little annoying. I will try to add these annotations soon.

If someone is interested in using gramatica and find it missing some feature, please say so.

You may be wondering why I have done this. Well, I have a program synthesizer written in Python that from a specification in a language similar to a reduced Python+Coq is sometimes able to write code that satisfies that specification in a language similar to reduced Python. After learning Rust I have decided that those three instances of Python should be replaced by Rust. Therefore, I need to be able to extend Rust’s grammar to allow specifications with propositions and other infinite types.