On using semi-colons

why this snippet runs with or without a semicolon?
below the break line. its an expression. if i put a semicolon it still compiles.

fn main() {
    let mut counter = 0;

    let result = loop {
        counter += 1;

        if counter == 10 {
            break counter * 2;
        }
    };

    println!("The result is {result}");
}

I assume you're talking about adding/removing a semicolon to the end of the if block. Well, we can run both variants through a nightly compiler using this:

rustc +nightly -Z unpretty=ast-tree --edition 2021 FILE.rs > FILE.ast

This gives us the compiler's internal representation of the source code. Diffing the two versions reveals exactly one meaningful difference between the one with no semicolon, and the one with a semicolon:

--- if-parse-1.rs.ast	Sat Jul  8 13:42:38 2023
+++ if-parse-2.rs.ast	Sat Jul  8 13:43:33 2023
     },
     Stmt {
         id: NodeId(4294967040),
-        kind: Expr(
+        kind: Semi(
             Expr {
                 id: NodeId(4294967040),

In the first case, the if counter = 10 { ... } block is treated as an expression, and is thus the trailing "result" expression of the body of the loop { ... } block. The body of a loop must result in () (i.e. no value of consequence), which is fine because that's what the if results in: it either jumps out, or (in the case of the implicit else branch) has no value.

When you put the semicolon in, it turns into a statement, which is also fine, because loops don't need a trailing expression.

(I've never done this before; it was fun to figure out how to conclusively prove this was what was going on!)

Addendum: If you're not familiar with this: all blocks (I think) in Rust that are a sequence of statements can optionally end with an expression. This expression is what the block evaluates to. This is why you can do things like let x = if a { b } else { c };. Loops also support this for consistency, although because they're loops, the final expression has to be of type ().

2 Likes

You can generally add a semicolon anywhere in a block, this creates an empty statement with an empty () value.

In this case, it's not clear what you mean, but the last statement in a block can choose whether it has a semicolon, the difference is that without the semicolon the block has the value of the expression.

The type of a break expression, though, is !, meaning that it doesn't have any value whatsoever, as it jumps execution elsewhere, so that doesn't apply whether you use a semicolon or not.

1 Like

i only have what the book says in the early chapters that if it needs to return a value it must be an expression. otherwise, a statement.

so i can keep using ; to end blocks. just like in C++

"block" is not the same as C++, in rust, branches, loops, pattern matches, are all expressions, they all have types. a statement can be used as an expression of the () type.

in the following snippet, when expr1 and expr2 have the same type T, then the whole if expression has the type T; if however expr1 and expr2 have different types, the if expression will be rejected by the type checker.

if condition { expr1 } else { expr2 }

when the else clause is omitted, it is equivalent to an () expression, thus the "then" clause must also have the () type. when you add semicolon to any expression, it is turned into an statement, which can be used as an expression of () type. so:

let _: i32 = if true { 1 } else { 0 }; // ok, both clause agree
let _: i32 = if true { 1 } else { () }; // error, integer and unit 
let _: i32 = if true { 1 }; // error, equivalent to integer and unit
let _: i32 = if true { 1; } else { 0; }; // error, both clauses agree, but statements are unit, cannot bind to i32

let () = if true { () } else { () }; // ok, explicit unit expression
let () = if true { 1; } else { 0; }; // ok, statement is unit
let () = if true { 1; } else {}; // ok, empty clause equivalent to unit
let () = if true { 1; }; // ok
let () = if true {}; // ok,

your original example works because loop, by itself, has the "never" type, unless there's break statement inside the loop. the final expression of the loop body doesn't affect the type of the whole loop expression, it is decided only by the break statements. so:

let _: i32 = loop {
    if true {
        break 0;
    }
    // this expression can be any type, unless it is also a `break`,
    // in which case it must agree with other breaks;
    // adding a semicolon just makes its type being `()`
    expression
};
2 Likes

Is your confusion based on the following?

  1. The book said that if a block evaluates to a value, it needs to end in an expression.
  2. let result = loop { ... }; means the block has to evaluate to a value.

If so, then I think this might be a slight misunderstanding.

#1 is mostly true. However, it's worth noting that having the last thing in a block be a statement is like having a ()-valued expression. For example, the following are effectively the same:

{ print!("thing") }
{ print!("thing"); }

print! results in (), so whether the semicolon is there on the final statement or not doesn't actually make a difference.

#2 is not quite true. There is a distinction between loop { body } and { body }. { body } has to evaluate to (), otherwise the loop would be discarding values (which might indicate a bug). However, you want the loop { body } to evaluate to something else, and that's what the break accomplishes. break effectively jumps out of the block and terminates the loop directly. This does not affect what { body } itself evaluates to, only the entire loop { body }.

So, for example, you can do this:

let result = loop {
    counter = change_counter(counter);
    if counter == 10 { break counter * 2; }
    call_unrelated_function();
    if counter > 100 { break counter / 5; }
    ()
};

I just didn't want you to think you had to leave the ; off the if, or to have the break come last in order for the loop to give you the value.

2 Likes

in the rust book, im at chap 3 only. where do you get this concepts of (), T, let _, let ()

seems they are advanced topics.

yes. 1 and 2 plus the chapters up to 3.

the variable "result" is being declared and initialized so the right side of the = sign has to return a value.

is () an empty set? nil? i dont understand its concept in rust.

It's kind of both. () is mentioned in Chapter 3.2: Data Types. (Confession: I've never actually read the Rust Book because I learned Rust before it existed.)

You can have tuples like (i32, f32, bool) which is an i32, an f32, and a bool all lumped together, and can make values of this type like so: (42, 3.14, false). You can think of them like anonymous structs, or fixed-sized arrays where each element can be a different type. You can have tuples of any length; this includes single-element tuples like (i32,) or even a tuple with zero elements: ().

() is a "unitary" type, meaning it has only one value. Due to tuple syntax, that values ends up being (). Also, because it contains no information, it doesn't consume any space in memory. This makes it useful for cases where you need a type or a value, but don't have anything meaningful to put there. If you have a function that doesn't return anything meaningful, it implicitly returns ():

// These are effectively the same:
fn returns_nil_1() { print!("Hi!") }
fn returns_nil_2() -> () { print!("Hi!") }

This is also the type of loop bodies, and of blocks without an explicit tail expression.

Correct. I was just worried that you might think you needed the contents of the { ... } block to evaluate to the value, as opposed to the loop as a whole.

For example, if you do this:

let result = loop { 42 };

This does not cause the loop to unconditionally exit with the value 42; it gives this error:

error[E0308]: mismatched types
 --> src/main.rs:2:25
  |
2 |     let result = loop { 42 };
  |                         ^^ expected `()`, found integer

The loop body needs to result in (). break is jumping out of the loop and forcing the loop { ... } expression as a whole to evaluate to the value you specify. You can think of it a bit like having a function that repeats forever until you explicitly return from it.

But because break can be anywhere in a loop body (like how return can show up anywhere inside a function body), it doesn't need to be at the end, and it doesn't need to be in a tail expression.

2 Likes

I personally don't like returning unit as a value without using semicolons. I use this clippy lint: Clippy Lints

Example

fn main() {
   println!("Hello world")
}

Use instead:

fn main() {
   println!("Hello world");
}

This is the actual rule: use clippy and stop worrying if you're "doing it right"!

2 Likes

precisely! i'm so used to C syntax that my pinky must always hit the ; key.

btw, should noobs unlearn thw ; key in learning Rust or not?

is "clippy" used with an IDE?

It can be, but you can also just run cargo clippy.

It will include all the messages cargo check would give, and also all the ones from clippy.