Guidelines For Using Semicolons & commas

I would be thankful for general guidelines governing use of semicolons & commas.

While looking at the sample code for "Guess The Number " example in "The Rust Programming Language", it is seen that one of the closing braces is followed by a semicolon while a few others are not.

Similarly, two of the statements are followed by comma while all others have semicolon.

Can you show which lines you're referring to? The only places I can think where a semicolon is allowed but not required is after a match arm, as well as after return, break, etc.

The thing that catches me out everyday is that a statement at the end of a function without a semicolon is the same as writing 'return' whatever ';'. Similarly for match results as noted above.

I don't think any guide line is required. If you get it wrong the compiler will complain, tell you why and likely what to do about it.

You don't need semicolons in match statements for the same reason that you don't need them at the end of a function when you want to return something. The comma is just for separating multiple cases in the match.

match guess.cmp(&secret_number) {
    Ordering::Less => println!("Too small!"),
    Ordering::Greater => println!("Too big!"),
    Ordering::Equal => {
        println!("You win!");
        break;
    }
}

Note that this match actually produces a value. The value it produces is whatever the case it chose returns, so it's one of:

  1. println!("Too small!")
  2. println!("Too big!")
  3. { println!("You win!"); break; }

You may think these don't return anything, and in a sense they don't. However Rust has a type called unit, which is written like this: () and it is used as the return value of anything that doesn't return any value. In this case the println!s don't return anything because println!s just have no return value, and as for the block, it returns nothing because the last statement in the block has a semicolon.

In fact if you replace the break with a return of the value five, like this:

match guess.cmp(&secret_number) {
    Ordering::Less => println!("Too small!"),
    Ordering::Greater => println!("Too big!"),
    Ordering::Equal => {
        println!("You win!");
        5
    }
}

you get this error

error[E0308]: mismatched types
  --> src/main.rs:12:13
   |
7  | /     match guess.cmp(&secret_number) {
8  | |         Ordering::Less => println!("Too small!"),
9  | |         Ordering::Greater => println!("Too big!"),
10 | |         Ordering::Equal => {
11 | |             println!("You win!");
12 | |             5
   | |             ^ expected (), found integer
13 | |         }
14 | |     }
   | |     -- help: consider using a semicolon here
   | |_____|
   |       expected this to be `()`
   |
   = note: expected type `()`
              found type `{integer}`

that says it expected the last match arm to return the unit type () because the other arms returned that type, and every arm in a match must return the same type.

If you take a look at this other match:

let guess: u32 = match guess.trim().parse() {
    Ok(num) => num,
    Err(_) => continue,
};

Then you will notice that it has a semicolon at the end. This is because the match is used as the value for the guess variable, and the semicolon is there to end the let guess = statement, not because of the match. Matches and if and loops are a bit of a special case because they allow skipping the semicolon when not used as a value.

As for what the match does, the first arm returns num which has the type u32 and the other arm returns continue. The continue expression actually returns a special type called never, which is written using an exclamation mark. This type means that the expression never returns, and the compiler implicitly converts the never type to any other type, which is valid as the never type actually never happens (thus the name).

You can even make functions that return the never type. One example is std::process::abort which never returns as it aborts the process.

6 Likes

I was referring to the listing 2-5 at the bottom of the following web page:
Sample Code for "Guess The Number"

Thanks alice - for your detailed clarification.

Much appreciated.

1 Like

Key insight: if there was a semicolon, only then would it be a statement - see Expression Statement - quote:

An expression statement is one that evaluates an expression and ignores its result.

i.e. the semicolon signals to throw away the value the expression evaluates to (which often is () (i.e. unit)). So omitting the semicolon indicates that the function or block should evaluate to the value resulting from the final expression.

fn main() {
  let result = {
    let a = 1;                          // end of a "let statement"
    let b = 2;                          // end of a "let statement"
    let result = if a < b {             // if EXPRESSION - not a STATEMENT 
            "less than"                 // NO semicolon - value if EXPRESSION evaluates to
        } else {                        //
            "greater than or equal to"  // NO semicolon - value if EXPRESSION evaluates to
        };                              // end of a "let statement"
        String::from(result)            // NO semicolon - value block evaluates to
    };                                  // end of a "let statement"
    
  println!("`a` is {} `b`", result)     // NO semicolon OK - evaluates to () (unit)
}                                       // which matches the return type of the function

Statements and Expressions:

Rust is primarily an expression language ...

In contrast, statements in Rust serve mostly to contain and explicitly sequence expression evaluation.

Coming from imperative, primarily statement based languages Rust's "punctuation" could be confusing.

What helped me to understand Erlang's ant turd tokens was to switch from a statement mindset to an expression (value) mindset (see also The Value of Values).

Now Rust does have statements but the majority of logic is expressed in terms of expressions so it can be helpful to adopt an expression mindset.

Once you can reliably identify let statements and expression statements it becomes a lot easier to tell where the semicolons go.

And once you adopt an expression mindset you are more likely to write

fn max(a: i32, b: i32) -> i32 {
    if a > b {
        a
    } else {
        b
    }
}

than


fn max(a: i32, b: i32) -> i32 {
    if a > b {
        return a;
    }
    return b;
}
4 Likes

Ah yes, thanks for that reminder.

It might help me stop tenaciously trying to write C in Rust!

I do have a little issue with things like:

    let result = if a < b {..."

To my mind the 'if' is the main thing going on here. Dictating flow of control. As such I like to see it up front. Still, I'll likely adjust to the idea with time.

Not sure I want anything to do with anything called 'ant turd tokens'.

2 Likes

In my view "flow of control" is still indicative of a "statement mindset", i.e. flowchart thinking.

Given that if is an expression, "flow of values" seems more appropriate to me - i.e. if returns a value by selecting one of many possible expressions and evaluating it.

For

if say_hello {
    println!("Hello");
}
  1. The block simply evaluates to () (unit) when executed.
  2. My mental model assumes an implicit else returning a () value
  3. In either case the returned () value is simply ignored.

Not sure I want anything to do with anything called 'ant turd tokens'.

  • commas sequence expressions
  • semicolons sequence function clauses or conditional branches
  • periods terminate function definitions

It's just unfamiliar if you are just used to dealing with statement delimiters.

Similarly in Rust expression statements sequence expressions - with some let statements thrown in. Conceptually

{
    expression; // expression statement
    expression; // expression statement
    expression  // just an expression
}

The block executes the three expressions in sequence and then evaluates to the value of the last one.

You make it sound like 'flowchart thinking' is a bad thing.

Every algorithm I have ever seen described anywhere, in maths or CS has been described in a flowchart way. All be it structured flow charts as typified by ALGOL.

One way to the exclusion of all others is usually a bad thing.

When you have some time: The Value of Values

"PLace-Oriented Programming" has a role in Rust because it is an important optimization for constrained environments. But compared to C/C++ I sense a definitive push towards "value-oriented programming" in the design of Rust.

1 Like

I think about it as the equivalent of C's

    result = a < b ? a : b;

... just even better and more powerful.

Ternary operator which is an expression unlike C's if which is a statement.

In Rust if is always an expression just like the C's ternary operator.

Fun fact: unfortunately, "always an expression" is a bit too simplistic. (Disclaimer: I am part of wg-grammar and this knowledge comes from there but is purely my opinion and not necessarily that of the group.)

if a {
    println!("a")
}
if b {
    println!("b")
}

Rust actually has a semicolon omission rule: if the expression ends with a {} block (e.g. if, match) and is typed at () (or !), it does not need to be terminated by a semicolon to be a statement (and as such followed by further statements/expression).

Similarly, the omission of commas after blocks in match statements is the same omission rule (currently, IIRC), not from some special handling of blocks as not a block expression.

This is painful to me as part of the team trying to formalize Rust's grammar. Whether an "if statement" is required to be terminated by a semicolon is determined by type checking.


As for guidelines on semicolons/commas when they're optional: use rustfmt and just do what it suggests. That's so much easier than debating over it.

4 Likes

Even the naming, semicolon omission rule, suggests it's simply a concession to "developer experience", much like JavaScript's ASI or even Rust's own deref coercion - so it wouldn't be the first time that DX messed with consistency that made the grammar more complicated.

Similarly, the omission of commas after blocks in match statements is the same omission rule

The wording in the reference seems to suggests that the comma is optional after a BlockExpression - i.e. that the closing brace of the block expression can complete the current MatchArm => BlockExpression construct within MatchArms.

Semicolons on the other hand aren't part of the match expression proper but only appear within subordinate constructs.

use rustfmt and just do what it suggests.

Fair enough.

I'm just the opinion that it can be helpful to make people coming from statement-oriented languages aware that Rust is much more expression-oriented than what they may have been used to in the past.

Letting them stick to their familiar statement-oriented mental model may just set them up for some sort of cognitive dissonance in the future.

1 Like

The missing semicolon for return value comes out of the ML family of languages, and I heard it described in a book on ocaml, that the semicolon is actually a binary operator that evaluates the left operand, discards the result, and then evaluates the right operand, and returns the result. If the right operand is missing, it is treated as ().

The description doesn't work perfectly with rust, but it's a useful alternative way to visualize it.

4 Likes

Thanks for the heads up on that Rich Hicky presentation. I watched the video here:https://www.infoq.com/presentations/Value-Values/

I'm all for 'value-oriented programming' or whatever higher level abstractions. It's just that in my experience so far they can easily make the code unreadable and the performance is degraded. So I'm not feeling the compulsion to get into it in a big way.

That is a lot of the attraction to Rust for me. Being a systems programming language one can still write C in Rust, almost. But there are lots of other goodies available as well.