How to understand the match pattern?

I'm following the Rust book and I just finished chapter 2.

I have a question about the match expression; I would like to know how it works exactly and understand the logic behind it.

This snippet is present in the tutorial:

        let guess: u32 = match guess.trim().parse()
        {
            Ok(num) => num,
            Err(_) => continue,
        };

I'm trying to understand what's really happening here.

For this I tried reading up on functions, match, result and patterns:

  • doc.rust-lang dot org/rust-by-example/flow_control/match.html
  • doc.rust-lang dot org/book/ch18-03-pattern-syntax.html
  • doc.rust-lang dot org/std/result/
  • doc.rust-lang dot org/rust-by-example/fn.html

I still can't seem to wrap my head around the match pattern completely. I'm coming from a C++ and C# background.

The following questions have arisen:

  • How is num returned into guess?
  • And why isn't continue asigned to guess?
  • How is it that Err is executed as a statement and Ok as an assignment?
  • Is the above always the case?
  • Why doesn't => return num, work (gives error)? Shouldn't it be the same?
  • Why can't I do Ok(num) => 0 or Ok(_) => 0 to make guess always be 0? (gives an error)

If I deduce it correctly:

  • Ok is always returned as a variable in a match.
  • End is always executed as a flow statement in a match (can only end with continue, break, return)? e.g.:
Err(_) => { 
    println!("error"); 
    break; // MUST always end with continue, return, break
},
  • You can't return anything else than the result of Ok

Am I wrong? Did I miss something?

1 Like

The value of the last expression in a block or match arm is the value of that block or match arm. Thus, num is output from that branch, and hence from the whole match when the result of parse() is Ok. The value of the match is what gets assigned to guess.

The keyword continue causes the loop to go to its next iteration, so the assignment never takes place (it occurs after evaluation of the match). The type of continue is actually the never type, which indicates something that never returns, and can be coerced to any type since it never takes a value.

Both are executed as expressions, but the Err branch never returns because it causes execution to go to the next loop iteration. The assignment takes place after the result of the match has been determined.

That returns from the surrounding function, which has type () in that case (since it is a main function), so it is a type error.

You can do that, but the problem is then that the parse() call must still run and, without num being assigned to guess, Rust has no way of determining the type of the output from that function (and so no way of knowing which version of parse() to call). Adding a type annotation to the parse() fixes it:

let guess: u32 = match guess.trim().parse::<u32>()
        {
            Ok(_) => 0,
            Err(_) => continue,
        };
5 Likes

Thank you, this helps me understand match much better :slight_smile:

2 Likes

They key difference here, regarding C++ at least (I don't know C#), is the difference between expression and statements.

This is a crucial difference between "old" / classic imperative languages and those with a more functional-language approach.

Luckily, there is a good example of this difference in C and C++:

if (cond) { then_body(); } else { else_body(); }

vs.

(  cond ? then_body() : else_body()  )

These are pretty-much alike, except that the former is a statement, and requires that then_body() and else_body() be statements too, whereas the latter is an expression, and it requires that then_body() and else_body() be expressions too.

I'm not gonna go into the formalism of all this, suffice to say that you can assign an expression to a variable, return it, feed it to a function, whereas a statement has no "value" (at least in C / C++):

__typeof__(foo()) result;
if (cond()) {
    result = foo();
} else {
    result = bar();
}

becomes:

auto result = ( cond ? foo() : bar() );

A good example of the distinction is that if you are defining a (preprocessor) macro that you intend to use as a function that returns a value, then it needs to expand to an expression, and that's why it can be very difficult or even impossible (without GNU extensions or whatnot) to transform some functions into macros: the language does not have the whole "everything should be usable as an expression" concept.

Rust, on the other hand, does.

You should imagine that the body of a Rust function: fn foo (<args>) -> <ret_ty> { <body> } is actually like:

<ret_ty> foo (<args>)
{
    return (<body>);
}

in C/C++.

Or in other words, a Rust function does not change if you do:

fn foo (<args>) -> <ret_ty>
{
    return { <body> };
}

You can notice that the parenthesis become braces, and that's because in Rust, expressions can contain statements, provided there are braces surrounding the whole expression, and semi-colons to separate the sequence of statements:
{ stmt1; stmt2; ...; stmtn; sub_expr? }
with the following key logic:

the value of this whole expression is the same as the value of sub_expr (if there is one), or () when there is no "trailing" expr (i.e., the contents of this "block" end with a semi-colon: { ... ; }

Where () is the (canonical) unit value (and type), i.e., the "empty" / devoid-of-information value. Something evaluates to () if does not evaluate to any meaningful value, much like the void return type of C functions (which is not the same void as in void *, but that's another topic).

So, for instance, one can see a statement as an expression that evaluates to ().

  • (And reciprocally, a semi-colon-ended expression becomes a(n expression) statement)

So in Rust, an if-then-else expression is like a C/C++ ternary:

if <condition> { <then> } else { <else> }

is the same as

( <condition> ? <then> : <else> )

and match statements, which are a generalised (multi-branch) version of if-then-else, are a similar thing:

let var = {
    match <expr> {
        <pat1> => <val1>,
        <pat2> => <val2>,
        <pat3> => <val3>,
    }
};

could be written in C/C++ as:

auto var = (
    __builtin__matches__(<expr>, <pat1>)
    ? <val1>
    : ( __builtin__matches__(<expr>, <pat2>)
        ? <val2>
        : <val3>
    )
);

Except that Rust match statement not only checks against special cases / conditions, it can also extract inner values that are known to be sound to extract within that guarded case / condition. So here is another way of seeing the previous match assignment:

__typeof__(<val1>) var;
switch __builtin__extract_pattern_of(<expr>) {
    case <pat1>:
        auto <inner_value_name1> = __builtin__extract_pat1_value_of(<expr>);
        var = <val1>; // can use <inner_value_name1>
    break;

    case <pat2>:
        auto <inner_value_name2> = __builtin__extract_pat2_value_of(<expr>);
        var = <val2>;
    break;

    case <pat3>:
        auto <inner_value_name3> = __builtin__extract_pat3_value_of(<expr>);
        var = <val3>;
    break;
}

In your example, for instance, when the expression matches the Ok ... case (i.e., some internal boolean tag / discriminant has the OK value), you get to automagically extract the num: <integer> value / payload associated with it (<inner_value_name1> = num)


Finally, there is a special case in this whole expression-statement reasoning: diverging statements.

Diverging statements / expression

An expression is said to be diverging if it never gets to be evaluated as an expression.

Imagine, for instance:

let x = loop { println!("Looping!"); };

What is the value of x?

Well, there are two possible answers here. We could say that since this infinite loop never ends, there is no meaningful value to extract from it, so x = (). And that is valid, but will actually be too restrictive.

Indeed, imagine now having:

let x = if condition {
    42_i32
} else {
    loop { println!("Looping!"); }
};

Here, we would like to say that x: i32, since it can sometimes be 42. But if the loop { ... } expression evaluated to unit (), the type checking would fail.

Indeed, in the same fashion that auto x = (cond ? foo() : bar() ) in C++ requires that foo() and bar() be of the same type, Rust's let x = if cond { foo() } else { bar() } requires the "same".
And 42_i32 does not have the same type as ().

  • else the following code let x = if cond { 42_i32 } else { println!("Hello!"); };
    would be valid, meaning that x could sometimes have no value, which would be absurd / unsound.

But in our case, the "bad" path (where "x would have no value") is actually unreachable / impossible to witness, given that the loop is infinite.

So we give a special name and meaning to that "impossible to witness" value: we call it the never type (or void type, but this is not C's void), and we write it as !. It is a type for which there can never exist values.

That's why expressions that can thus never be evaluated, such as loop { ... }, are given the ! type.

And the if-then-else rule now becomes:

the type of foo() must be the same as the type of bar(), or one of the two expression need to diverge / evaluate to the ! type.

And so, what can diverge?

  • infinite loops such as loop { ... }

  • unwinding / panicking, such as panic!(...), unreachable!(...),

  • aborting / ending the whole process, such as ::std::process::abort() or ::std::process::exit(...),

  • statements that interrupt the control flow, such as:

    • (early) return <expr>

    • break <expr>

    • continue

The latter answers to your question:

We could say that sometimes continue could be "assigned" to guess (with the ! type), but given that it is a continue statement the control flow of the program jumps or goes to somewhere else, so the "bad value of guess cannot be witnessed.

Well, as said when unsugaring the matchinto a C switch, match performs guard / checks (I guess that's what you mean by "statement"), but also binds the inner automagically extracted contents (if any) to a variable name of your choice. Be well aware that a (destructuring) pattern does these two things. That being said, you can tell Rust that you do not care about his inner value by using the special _ "name". SO in the Err case there is an assignment too, you have just expressed that you do not care about that value.

What I have said is always the case, although there are some patterns for which there is never any data to be extracted, for instance, Option<_>: the Some variant has an (inner) thing you can bind to (much like with Ok(inner) and Err(inner), but the None variant has no associated "payload".

The return <expr> statement applies at the scope of the whole function, meaning that your return num (which evaluates to ! for the guess variable), is trying to return that number from the function that contains the match, and it errors because that function does not return an integer.

That should work; can you post on the playground the code that gives an error?

EDIT: @jameseb7 guessed this one, the error did not come from the match arm (but from type inference somewhere else)

12 Likes

Thank you too for the more in-depth exploration and comparisons with C++! :smiley:

Thanks @jameseb7 and @Yandros! and @gzxmx94 for asking.

Very helpful discussion, especially considering the official documentation says nothing:
https://doc.rust-lang.org/std/keyword.match.html

1 Like

The standard library documentation for keywords is a bit patchy, although documentation is being added for them (in particular, documentation for match was added near the end of November, so it should be in the next stable release, and can already be seen for the beta release). The Rust book is more developed and provides a better explanation for someone unfamiliar with Rust; an explanation of match can be found in section 6.2 of the book. There is also the relevant part of the reference, if you want a lot of detail.

3 Likes

Also, I have heavily borrowed from C code examples for my explanation, which is useful for those knowing that language; but the official Rust documentation / book aims to be as much language agnostic as possible, which makes explaining all the things that are going on in a match expression very hard, I imagine.