Reference documentation of Block Expressions

For a school project I am writing a rust compiler/interpeter. I have some questions about the grammar listed in The rust reference about block expressions.

In the grammar it states that a block expressions consists of optional statements. These statements are 0 or more Statement with an optional ExpressionWithoutBlock.

But a bit lower it says: " The syntax for a block is { , then any inner attributes, then any number of statements, then an optional expression, called the final operand, and finally a } ."

This piece of text seams more correct than the grammar, since the following code is correct when the last semicolon is missing:

fn check_value() -> i32 {
    {
        4
    }
}

But when I add that semicolon, it gives a type error that there is no tail in the body:

fn check_value() -> i32 {
    {
        4
    };
}

So by experimenting it seems that the ExpressionWithoutBlock really should be just an Expression in the grammar.

Since I am not an expert at all in rust I wanted to ask here if I am interpreting this correctly?

In your first example, the {4} expression is the final operand of the function. In your second example, the {4} expression is just a statement, and the function has no final operand.

To be clear, both examples are valid according to the grammar. It just doesn't type check.

2 Likes

Looks like you rediscovered an inaccuracy… well, or at least confusing presentation[1]… in the grammar, as reported in

Edit: On second thought, I suppose the way it’s presented in the grammar in the reference is correct for parsing as in recognizing all syntactically correct Rust programs, and makes parsing easier because it avoids ambiguity. On the other hand, it leaves you with a syntax tree that does not properly correspond with the optimal way of looking at the block’s content, though that could be fixed in a subsequent transformation. I don’t know how rustc itself does that, or whether it parses in a different structure to begin with.

The necessary transformation would involve to identify a trailing return expression, if it exists (and then either somehow mark it, or separate it out, or do this identification on every relevant pass that needs to handle it differently). So: that return expression can be the trailing ExpressionWithoutBlock, or if such an ExpressionWithoutBlock does not exist, it can also be the last one of the statements in the Statement+ part, but only if the last of those statements itself is an ExpressionStatement in the _semicolon-less ExpressionWithBlock variant.

Identifying this return expression is dually important

  • on one hand, it’s important to … well … to know what value to return, and to match up the types accordingly, even if the return expression is hidden in that Statement+ section
  • on the other hand, for this return expression at the end of the Statement+ part, the otherwise applicable rule is lifted, that semicolon-less ExpressionStatement as ExpressionStatement must have type ().

  1. it looks like the comment on the issue explain how this could be understood in a way that would make it not inaccurate ↩︎