Understanding `return` and how it fits conceptually with other expressions

Had some explanations elsewhere about the "weird" function:

fn strange() -> bool {let _x:bool = return true;}
// or, without coercion, using experimental `!`
fn strange() -> bool {let _x:! = return true;}

I accept that:

  1. return true; exits the function with that value (true)
  2. Has an associated type as an expression (although it has no value) which is !
  3. In that specific case, the ! can coerce to bool in the assignment (or to anything else) so it passes the type checking.

In other words, the "argument" to return (output of the function) and the type of it's evaluation (although no value appears) are separate things.

One initial problem is that it does not match the definition of expression (it yields no value). The other issue is that to an extent a value does result from it, even though it's not assigned to a but to the function's output.

I've seen other examples using !unreachable() or continue or break in the std documentation for the never type-!

It's also clear that it's called "never" because it's got no value.

To me, without much knowledge so far, but intuitively from reading code, it feels that one is accepting a little bit of an irrational statement.

Besides some details that can be wrong here an there in my explanation above, what are some important conceptual frames for this? Or how do you see it (does not need to match a strict definition)?

PS: I did read -and will keep doing- related threads, but I'd like to have some new conversation about it.

It is possible to design a language which does not have purposefully diverging expressions (that is, expressions which cause control flow to do something else than producing a value from that expression) like return and panic!(). In some ways, this would be a much more elegant language to work with, with fewer surprises. However,

  • In practice, it turns out to be very useful to write functions with “early returns” — cases where they return a value immediately instead of completing the remaining expression, statement, or block. These early returns may denote errors, or successes (e.g. "found item in cache, no need to compute it").

    (You can design a language where early returns are possible but never occur within an expression, but that requires a stronger expression/statement distinction than Rust has.)

  • Removing explicitly diverging expressions doesn't remove expressions that implicitly diverge by never completing, like loop {} or std::iter::repeat(1).last(), unless you restrict the language even further to be non-Turing-complete (as in e.g. total functional programming). Therefore, it is always possible for an expression to diverge, and analysis of programs must acknowledge that. Given that, we might as well take advantage of it, and add explicit divergence that is useful.

  • It can be quite hard to write total functions (functions that always return a value) that can be statically checked as such, so panic!() divergence is an important part of the language, even more so than early returns are. If your expressions were defined to always evaluate to values and not diverge, then you'd have to write total functions to be able to call those functions.

7 Likes

You might be interested in learning a bit more of type theory, like for example that the never type is also known as the "bottom" type, and sits in the opposite side of the any type, which is also known as the "top" type.

You can leverage that the never type cannot be inhabited to achieve exhautiveness checking in other programming languages that don't have it.

Here are a some concepts that have existed and functioned from even before never (!) was made into a type.

// Diverging functions
fn exit() -> ! { ... }

The compiler understands that exit never returns,[1] so code after a call to exit is dead code. The compiler also ensures any returning path in the exit body were in dead code, so that the function is in fact diverging.

// Uninhabited types: This has no variants (0 possible values)
// (In contrast, `()` has one possible value.)
enum NoVariants {}

The compiler understands that matches on values of NoVariants are also dead code.

fn diverge(nv: NoVariants) -> ! {
    match nv {}
}

You can think of code statically determined to be dead code -- code that can never execute -- as the genesis of an expression with no value, in some sense.

The compiler also allows these, again due to dead code / control flow analysis:

pub fn example_1() -> String {
    loop {}
}

pub fn example_2<T>(nv: NoVariants) -> T {
    match nv {}
}

And the assignment in fn strange is a sort of inlined version of those: everything after return is dead code in the control flow of strange.

Without coercion from dead code in some form, you can't have functions like Option::<T>::unwrap, as the function has no way to actually produce the T value in the None case.

The above examples have always been allowed. But see also the motivations for making ! a type.


  1. though it may unwind ↩︎

2 Likes

There is no inconsistency here. If an expression has type T, what it means is that if the expression yields a value, then the value will be in the set T. ! is the empty set.

For all types it is possible for an expression of that type to never yield a value. For example if you call a function returning i32, that function might contain an infinite loop. So ! is not an exception in this regard.

That's a different destination, so there is no inconsistency. An expression of type i32 could also put a bool value in some other destination as a side effect.

4 Likes

Maybe I am replying too early but my issue why this isn't like a return is that I don't quite see how the wrapping function ends up with a value:

// Not Diverging function?
fn x() -> i32 { return 6 }

however, the reference

An expression may have two roles: it always produces a value , and it may have effects (otherwise known as “side effects”).

I go by what they say when analysing Rust code. They may be wrong, but I'm not just pasting my opinions.

So as I said above, if it is an expression then it "always produce a value" is what it states there.

Can you write how the function return looks like?

So in this case it doesn't yield a value in the line (that's how I imagine resolving an expression to a value) but it does yield a value elsewhere; somehow this is a bit confusing to me. Same for fn x()->i32{return 5;} where it's easy to interpret (wrongly) that return 5; isn't actually an expression without value, but one that puts the value in a place other than the "line" i.e the function's output.

I assume my confusion is more rudimentary than your explanation, but appreciate the completeness and clarity.

Priorly, I tried to understand return 5 in this way:

              -----> 5 (i32) // "side effect" of some kind
            /
return 5 
            \
              -----> nothing (!) // happens in the "line"

so sort of like a function that yields 2 values and nothing. But this seems wrong (besides that "nothing" is hard to picture for me)

PS: I do get that a vec.push(5); would also put a value elsewhere. Just doesn't seem as confusing for me.

The reference is just wrong here. An expression usually produces a value, but there are exceptions: panics, infinite loops, return from function, break from loop, etc.

The return expression never produces a value. Instead, it prematurely aborts the execution of the whole function, using its argument as the function return value instead of the usual return value at the end of the function body.

Sorry, I don't understand. ("wrapping function"? Where did fn x come from?)

It's not a function, it's a control flow directive.

This should probably be fixed, since I agree it's confusing.

The expression return e has type !, but also, given a surrounding function definition, makes the type of function calls to that function have the type of e. So it really "results" in two different ways.

In other words, when you say a value does "result from it" (referring to the return expression), it's important to be clear about what you mean by result. The typical use of return e is to "result" in that function returning the value of e. However, it's also valid to say that the "result" of a return e expression itself has type !. This is because while the reference may be wrong about all expressions having values, I believe it's correct to state that all expressions must have a type.

Now then, I think the real question you might be asking (I too have wondered this) is why make return e an expression at all? Why not just make it a statement? I'll be honest, I don't know the answer to this. We have other statements like the let statement, which don't themselves have a type, i.e. let x = let y = 1; is a syntax error. Why don't we do the same thing for return?

While it's perfectly sound to allow let dead: String = return 12;, I'm not sure I understand what value it brings to the language. I'm guessing it might have something to do with consistency with polymorphic code, or generally unifying concepts in the type system/grammar, but I'd love to hear what people have to say about this.

Finally, it may be useful to read rfcs/text/1216-bang-type.md at master ¡ rust-lang/rfcs ¡ GitHub since it adds some more context to this discussion.

It least one consideration is, what would that do to idiomatic code?

-    let Some(value) = option else { return };
+    let Some(value) = option else {
+        return;
+    };

+// (Rustfmt already forces the multi-line form for many other `if`
+// related expressions today, but I wish it wouldn't.)
     match result {
-        Ok(0) => return Ok(()),
+        Ok(0) => {
+            return Ok(());
+        }
         Ok(_) => {}
         Err(e) => {
             eprintln!("{e}");
             // ...
         }
     }

Doesn't seem too horrible I guess, though the deeper your return is buried, the worse it would be.

The coercion from dead code to any type still has to be possible, so there's still "magic" that has to go on. It's just more cumbersome to observe.

    let x: String = if condition {
        return;
    } else {
        String::new()
    };

    // Or even
    let x: String = {
        return;
    };

I tried to find prior discussion, and found this one:

[Niko]

I would be amenable to making ret more of a statement. There are a few forms (I have to look and see which they are, but ret is one) which don't make much sense as expressions, not really.

[Graydon]

It's more that we've been down the road of having lots-of-statements before and it didn't really work well; too many asymmetries, inconsistencies, special cases. We decided to focus on a more uniform expression treatment, even at the expense of accepting some less-sensible-looking expressions.

They were statements until February 2011. The first bootstrap was apparently in April 2011. So they've been expressions since rustc was actually able to compile Rust.

There are more conversations in the mailing list.

1 Like

Thanks for digging that up! It matches my understanding well.

As for the examples you mentioned not working in nested places, I don't see the issue. let is a statement, and it's allowed in those places.

And yes, return e having a ! type is needed for unifying types where it's used as part of a branch of a conditional for example, but that doesn't technically mean it must be an expression, though it could certainly make the logic in the compiler simpler since it behaves like any other expression during type checking. This alone is actually a pretty good argument for the way it is.

It's not as if let x = return 1; does anything truly unexpected, and should give an informative warning of the dead x.

When return is an expression returning !, it can be type-checked so the compiler simply has to check "does the returned value have uninhabited type?"; when it's a statement, some magic (control flow analysis) is required to prove that block diverges in all cases.
The former option looks much less likely to introduce an error into compiler.

Note that the compiler does allow some amount of control flow information to influence typing. For example, this program prints Vec<()> Vec<!> Vec<()>, because only the block used with v2 diverges:

fn main() {
    let mut v1 = vec![];
    let mut v2 = vec![];
    let mut v3 = vec![];
    if false {
        v1.push({ loop { break; }; "don't panic"; });
        v2.push({ loop {}; "don't panic"; });
        v3.push({ "don't panic"; });
    }
    println!(
        "{} {} {}",
        std::any::type_name_of_val(&v1),
        std::any::type_name_of_val(&v2),
        std::any::type_name_of_val(&v3)
    );
}

This can, I believe, be understood solely as a type inference rule, roughly like "the type of a block without a terminal expression is ! if it contains any expression statement whose type is !”, but it is a limited form of static analysis of control flow.

2 Likes

That is already done in ways that influence what is valid Rust. In example to the type example,

1 Like