We all know, that a Rust function returns the unit type () when the function body ends not with an expression but with a statement.
In the official book this is explained a bit un-precise, so we just had a longer discussion in the GitHub issue tracker. Indeed it is difficult to find a precise explanation. One problem is, that some documentation says, that a statement in Rust has no value at all, but other say that a statement has the value of the unit type ().
In my own book, I have the following text:
"Important: Adding a semicolon ; after the final expression [in the function] turns it into a statement. Statements evaluate to the unit type (). If a function is expected to return a value (e.g., -> i32), ending it with a statement like a * b; will result in a type mismatch error, because the function implicitly returns () instead of the expected i32."
I still believe this is correct, but someone in the issue tracker has some doubts, and so I am not fully sure as well.
I believe it's correct to say that a statement has no value.
A block expression (including the function block) has any number of statements and then an optional expression as its value -- if there's no final expression, that's where we get an implicit () value instead.
" The type of a block is the type of the final operand, or () if the final operand is omitted."
So it would make sense to say that a statement actually has no value, and the reason why a function with a body containing a statement at the end returns the unit type is, that the function body is a block, and the block returns () because the final operand is omitted?
“A statement in Rust has no value” is a common way to explain it, but it’s a bit imprecise because, technically, statements do have a value—it’s just the unit value (), which indicates the absence of a meaningful value* .
“A statement in Rust has the value () which is the unit type” is more technically accurate. In Rust, every statement (or expression statement) that does not return another value implicitly returns (), the unit type.
That is more my initial understanding, which I read somewhere long time ago.
So one could say, that the body block of a function could return the unit type when the block ends with a statement, because the statement itself has the value (). On the other hand, an empty block, or a block ending with a ;, also has the value () of the unit type. A precise description is really difficult.
IMO this is meaningless, because there's no way to observe this supposed value of a statement. e.g. You can't return it or assign it, since those would be expressions within a statement.
But I would rather consider primary sources over any AI.
BTW, since empty block expressions {} also have the value (), we don't need to worry about statements to understand a block's type. No matter how many statements there are, if the block doesn't end in an expression, then it acts like there was a ().
the thing is, rust is not fully an expression language. and the reference uses an interesting phrasing:
Rust is primarily an expression language. This means that most forms of value-producing or effect-causing evaluation are directed by the uniform syntax category of expressions. ...
if it were a true expression language, statement can be a different kind of expression, say, "statement expression", for lack of better term, no different from "literal expression", "operator expression", etc, and, naturally, the type for "statement expression" can simply be defined as "evaluates to the unit type".
but it's not. I'm sure there are good reason for the design. but the fact that statements are NOT valid at expressions position, makes it meaningless to discuss the "type" of a statement.
for example, a declaration is a statement, but you CANNOT use it in expression position, even if the expected type is (). on the other hand, you can put it in a block expression:
// type declaration statement:
// syntax error, expected expression, found `struct` keyword;
let _: () = struct Foo {};
// compiles: the block expression evaluates to `()`
let _: () = { struct Foo {} };
// same for a variable declaration statement:
// syntax error: expected expression, found `let` statement
let _: () = let _ = 0;
// compiles: block expression evaluates to `()`
// note, semicolon is part of `let` statement
let _: () = { let _ = 0; };
the magic trick is the block expression, which explictly turns (a sequence of) statements into an expression:
As a control flow expression, a block sequentially executes its component non-item declaration statements and then its final optional expression.
if you check the grammar for block expression, it should be clear that, a final statement with semicolon is parsed as one of the sequence of the statements, it is not counted as the final "optional expression".
pedantically, I don't think this intepretation is accurate, per the current definition of statements. statements don't have the value (), or any value at all.
OK, I think it makes actually more sense to say that a statement returns no value at all, instead to telling that it returns the unit type. I have modified the book text as below, which actually makes the text a bit shorter, which is always a good thing.
$ diff 05common_concepts.bak8 05common_concepts.md
192c192
< A **statement** performs an action but does not evaluate to a useful value. Statements end with a semicolon (`;`). The semicolon effectively discards the value of the preceding expression, making the overall construct evaluate to the *unit type* `()`.
---
> A **statement** performs an action but does not evaluate to a useful value. Statements end with a semicolon (`;`). The semicolon effectively discards the value of the preceding expression.
216c216
< A code block enclosed in curly braces `{ ... }` is itself an **expression**. Its value is the value of the *last expression* within the block.
---
> A code block enclosed in curly braces `{ ... }` is itself an **expression**.
stefan@hx90 ~/rust_for_c_programmers/src $ diff 08functions.bak7 08functions.md
107c107
< - `{ ... }`: The function body, enclosed in curly braces.
---
> - `{ ... }`: The function body -- a block, enclosed in curly braces.
346c346
< Functions can return values of almost any type. The return type is specified after the `->` arrow in the function signature.
---
> Functions in Rust are capable of returning values, and their return type is explicitly defined in the function signature after the `->` arrow. If no return type is specified, the function implicitly returns the *unit type* `()`, which is an empty tuple.
396c396,399
< **Important:** Adding a semicolon `;` after the final expression turns it into a *statement*. Statements evaluate to the unit type `()`. If a function is expected to return a value (e.g., `-> i32`), ending it with a statement like `a * b;` will result in a type mismatch error, because the function implicitly returns `()` instead of the expected `i32`.
---
> When the final expression in a function's body block is followed by a semicolon `;`, or if the function body is empty, the function implicitly returns the unit type `()`. This is because a semicolon transforms an expression into a *statement*, which, similar to `void` in C, does not yield a value.
>
> Consequently, if a function signature declares a specific return type (e.g., `-> i32`), but the body's last line is a statement (e.g., `a * b;`), a type mismatch error will occur. The function would implicitly return `()` when an `i32` is expected. To return an actual value, the final expression must *not* be terminated by a semicolon.
>
``
is this sentence in the context of *expression statements" specifically?
there are two types of statements in rust: declaration statements and expression statements. this sentence is only true for expression statements, not necessarily true for declaration statements. also, there's no values to "discard" in the case of declaration statements.
for example, let bindings (variable declarations) always need a semicolon:
let x = 42;
let y: i32;
let z = x else { return };
but for item declarations, some require an semicolon, others don't:
struct UnitStruct;
struct TupleStruct();
struct StructStruct {}
extern "ABI" fn function_declaration();
fn function_definition() {}
macro_invocation_with_parens!();
macro_invocation_with_braces!{}
mod out_of_line_module;
mod inline_module {}
// and many other examples...
Thanks for that hint, I should really make the distinction.
Fixed:
$ diff 05common_concepts.bak8 05common_concepts.md
192c192
< A **statement** performs an action but does not evaluate to a useful value. Statements end with a semicolon (`;`). The semicolon effectively discards the value of the preceding expression, making the overall construct evaluate to the *unit type* `()`.
---
> A **statement** performs an action but does not evaluate to a useful value.
200c200
< 2. **Expression Statements:** An expression followed by a semicolon. This is used when you care only about the *side effect* of the expression (like calling a function that modifies state or performs I/O) and want to discard its return value.
---
> 2. **Expression Statements:** An expression followed by a semicolon. This is used when you care only about the *side effect* of the expression (like calling a function that modifies state or performs I/O) and want to discard its return value. The semicolon effectively discards the value of the preceding expression.
216c216
< A code block enclosed in curly braces `{ ... }` is itself an **expression**. Its value is the value of the *last expression* within the block.
---
> In Rust, a code block enclosed in curly braces `{ ... }` is an **expression** that itself evaluates to a value.
218,219c218,219
< * If the last expression **lacks** a semicolon, the block evaluates to the value of that expression.
< * If the last expression **has** a semicolon, or if the block is empty, the block evaluates to the unit type `()`.
---
> * A block evaluates to the value of its final **expression**.
> * If the block is empty, or if its final construct is a **statement** (which includes expressions followed by a semicolon, or other statement types like `let` bindings or item declarations), the block evaluates to the unit type `()`. This behavior is distinct from C, where code blocks do not typically yield a value directly.
stefan@hx90 ~/rust_for_c_programmers/src $ diff 08functions.bak7 08functions.md
107c107
< - `{ ... }`: The function body, enclosed in curly braces.
---
> - `{ ... }`: The function body -- a block, enclosed in curly braces.
346c346
< Functions can return values of almost any type. The return type is specified after the `->` arrow in the function signature.
---
> Functions in Rust are capable of returning values, and their return type is explicitly defined in the function signature after the `->` arrow. If no return type is specified, the function implicitly returns the *unit type* `()`, which is an empty tuple.
396c396,399
< **Important:** Adding a semicolon `;` after the final expression turns it into a *statement*. Statements evaluate to the unit type `()`. If a function is expected to return a value (e.g., `-> i32`), ending it with a statement like `a * b;` will result in a type mismatch error, because the function implicitly returns `()` instead of the expected `i32`.
---
> A function's body is fundamentally a block expression. As discussed, blocks that either are empty or conclude with a statement evaluate to the unit type `()`.
>
> Therefore, if a function signature specifies a return type (e.g., `-> i32`), but the function's body ends with a statement (such as an expression terminated by a semicolon, e.g., `a * b;`), a type mismatch error will result. This is because the function implicitly returns `()` instead of the expected `i32`. To successfully return a value, the final construct in the function body *must be an expression not terminated by a semicolon*.
>
Another way to think about it is that all non-unwinding exit paths[1] from a block must "agree"[2] on a type, like how all branches of a match must. The exit path at the end of the block is
the type of the trailing expression, if present, else
!,[3] if the end of the block is in dead code, else
()
I don't know where, if anywhere, the middle bullet point is discussed in the reference (or if it makes sense to call it out in your guide or not).
// But otherwise this would fail until you deleted the semicolon
fn example() -> i32 {
unimplemented!();
// You can add as many `;`-terminated statements here as you want.
// You'll get warnings about dead code, but it will still compile.
}