I'm currently creating a parser for rust using Menhir and Coq, and I noticed that the shift/reduce conflict error when it comes to inner and outer attributes is impossible to solve. This is an example
#[allow(unused_variables)]
mod my_module {
#![allow(dead_code)]
fn hidden_function() {}
}
In order for a parser to be considered LR(1), it should be able to look at the 1 token after it and be able to decide whether to shift (move forward to the next token), or reduce (turn the tokens in it's scope in a node for the AST).
For example, if the compiler encounters the unsafe keyword, then it will know to shift, as there is more data it needs to figure out before making a node in the AST. However, if it encounters a semicolon, then it knows to reduce, as the semicolon indicates that there are no more tokens to shift to in order to create a node.
We can see that the top level module has a outer attribute, and because it isn't nested in another item, the parser does not expect an inner attribute yet. That means that it is obvious to the parser that a # must mean outer attribute. However, once we get into nested items, this becomes impossible. If the parser encounters a #, then it is not able to immediately tell whether it will be reading an outer attribute or an inner attribute. It must read !, or lack thereof, in order to figure out whether it will be reading an outer attribute or an inner attribute. This gives it an LR(2), as it must read 2 tokens ahead to know whether to shift or reduce.
This minor issue could be solved by making inner and outer attributes start with different tokens.
Here, the inner attribute starts with an exclamation point (!), which makes it LR(1), as there is no need to look to the next token for the parser to know what rule to apply.
#[allow(unused_variables)]
mod my_module {
!#[allow(dead_code)]
fn hidden_function() {}
}
Is my conclusion correct? What do you think? I don't believe there is a need for this to be changed, as I can easily write a formal proof to prove that this LR(2) does not create undefined behavior.