Help understanding match ergonomics

Hi, rustaceans, I have some trouble understanding match ergonomics.

Let me give you some background info. Some time ago, I asked this question on stack overflow. After reading the accepted answer, I thought I have understood what it is , and what will the

manipulation algorithm do. Yesterday, I reviewed this question, finding some contradictions.

Here is the algorithm in English. Original answer quoted:

The process works as follows: the compiler process the pattern from the outside inwards. The process starts with move as the binding mode.

Each time the compiler needs to match a non-reference pattern (literal, struct, tuple, slice) against a reference, it automatically dereferences the reference and updates the binding mode: when & reference is matched against we'll get the ref binding mode, and for &mut references we will get ref if the current binding mode is ref or otherwise ref mut. Then this process repeats until we don't have a reference anymore.

If we're matching against a reference pattern (binding, wildcard, consts of reference types or &/&mut patterns), the default binding mode is reset back to move.

According to this description, I wrote a pseudo code to demo the algorithm:

 1 let mut binding_mode = `move`;  // the default binding mode is `move`
 2 
 3 loop (until we don't have a reference to match anymore) {
 4     if (encounters `non-ref pattern`) {
 5         deref ref; // dereference the reference we are matching
 6 
 7         if (the reference we are matching is `&xx`) {
 8             binding_mode = `ref`
 9         } else if (the reference we are matching is `&mut xx`) {
10             if (binding_mode == `ref`) {
11                binding_mode = `ref`;
12             } else {
13                binding_mode = `ref mut`;
14             }
15         }
16     } else if (encounters `ref pattern`) {
17         binding_mode = `move`;
18     }
19 }
20 
21 // when loop is done:
21 match binding_mode {
22 		move => directly bind,
23    	ref  => bind as ref,
24		ref mut => bind as ref mut,
25 }

And I use this algorithm to test the following cases:

// case 1
let (a, b) = &(&String, &String);   // a and b are of type &&String
// case 2
let (&a, &b) = &(&String, &String); // a and b are of type String

Let me quote the original answerer's inference:

Inference for case 2:

This is the most interesting one. Remember how we talked about reference vs. non-reference patterns? In this example that fact plays a crucial role.

First we match the tuple pattern against the type &(&String, &String). We dereference the tuple and set binding_mode = ref. Now we match the tuple: we got to match &a and &b each against &String, with the binding mode set to ref.

What happens when we match &a against &String? [BREAK POINT 1]

Well, remember & is a reference pattern, and when matching reference patterns we completely ignore the binding mode(aka. set binding mode to move). So we match &a against &String, with the binding mode reset to move. This removes the reference from both sides, leaving us with a = String. Same for &b.

Inference for case 1:

We match a non-reference pattern (a tuple pattern) against a reference (of type &(&String, &String)). So we dereference the reference and set the binding mode to ref.

Now we got a tuple pattern to match against a tuple of type (&String, &String) and a binding mode of ref. We match a against &String: [BREAK POINT 2]

it's a reference pattern (binding), and so we don't change the binding mode. However, we already have a binding mode of ref. The type we match against is &String, and ref means we add a reference, so we end with &&String. Exactly the same thing happens to b.

My questions are:

  1. What should we do when encountering a reference pattern?

    At [BREAK POINT 1] and [BREAK POINT 2], we are both encounting reference pattern, but in case 2,

    we set binding mode to the default move. And in case 1, we didn't change the binding mode

  2. None of the inferences above are consistent with the algorithm. The loop shouldn't stop until there is no more references to match, the inference for case 2 ends the whole inference when encountering reference pattern.

    // what inference for case 2 is somthing like this.
    else if (encounters `ref pattern`) {
    	binding_mode = `move` // Is this right?
        directly bind;
        exit(0);
    }
    

    And in the inference for case 1, at [BREAK POINT 2], encountering reference pattern does not reset binding mode back to move. So it seems that the algorithm is incorrect and incomplete.

    What is the right and deterministic algorithm for this inference process?

I have read the RFC. But the algorithm described in this section and the given examples do not include the cases where encounters reference pattern, so it didn't solve my problem:(

A nice dude found this from here:

Still, it seems confusing (to me) to have & "reset' the default binding mode, rather than pairing off with a &T type -- and there is also the common belief among newcomers that & should mean what "ref" means to contend with (that is, when first reading your example, I thought you were attempting to do the opposite of what in fact you are attempting). It was also considered to add a move binding mode, which would allow one to write for (move a, b) in &vec , but that was confusing too.

which seems to prove that reference pattern like &xx will reset binding mode

Another evidence to prove that is the reference pattern is something like &pat, binding mode will be reset back to move, source code of rust-analyzer PR impling this rfc, source, file crates/ra_hir/src/ty/infer.rs, line 623-625

// When you encounter a `&pat` pattern, reset to Move.
// This is so that `w` is by value: `let (_, &w) = &(1, &2);`
default_bm = BindingMode::Move;

I think you're terminating your pseudocode loop far too early. Here's how I'd write out the algorithm, in a recursive form:

If no binding mode is set, set the binding mode to `move`.

If the pattern is an identifier pattern:
    If the binding mode is `move`, bind directly.
    If the binding mode is `ref`, bind by immutable reference.
    If the binding mode is `ret mut`, bind by mutable reference.
    Expose the binding to the relevant scope.
    If the pattern has an inner pattern, repeat this process with that pattern.

Otherwise, if the pattern is a `&[mut]` reference pattern:
    Ensure that we are matching against a `&[mut]` reference.
    Dereference the scrutinee expression.
    Set the binding mode to `move`.
    Repeat this process with the inner pattern.

Otherwise, if the pattern is any kind of destructuring pattern:
    Set `T` to the type implied by the pattern.
    Ensure that we are matching against a `T` value or `&[mut] T` reference.
    If we are matching against a `&T` reference:
        Dereference the scrutinee expression.
        Set the binding mode to `ref`.
    Otherwise, if we are matching against a `&mut T` reference:
        Dereference the scrutinee expression.
        If the binding mode is `move`, set the binding mode to `ref mut`.
    Repeat this process for all fields in the pattern.

When we are no longer matching against a reference, we can start destructuring with the current binding mode.

1 Like

In my pseudo-code, pattern is classified into 2 categories. And in your algorithm, they are divided into 3 classifications. And it seems that your algorithm is some kind of much more generic matching.
Would you like to tell the relationship between my 2 categories and your 3 categories? Thx:)

Here, I split "non-reference patterns" into identifier patterns (like x or ref x) and destructuring patterns (like (pat1, pat2) or Struct { x: pat1, y: pat2 }). The algorithm you quoted is incomplete: it only talks about the case where the pattern is matching against a reference, and you accordingly wrote "loop (until we don't have a reference to match anymore)". But we don't stop once we're matching against a value! We only stop once we have no more destructuring or reference patterns. The part about matching against references is only a portion of the whole algorithm.

Here are some of the conclusions I draw from your algorithm:

  1. The concept of binding mode has been integrated into the whole matching system, and it is not exclusive to matching non-reference pattern against a reference
  2. The terminologies you are using are consistent with the reference, which is general. But non-reference pattern and reference pattern are not general, they are applicative only when talking about matching against reference, as they are define in rfc 2005.
  3. Even though binding mode has been itergrated into the whole system, rfc says The *default binding mode* only changes when matching a reference with a non-reference pattern.. It seems like it is meaningless to talk about it when we are not matching non-ref pattern against a reference, cause the binding mode is always move.

Are my conclusions right?

I still have some questions:

Here, I split "non-reference patterns" into identifier patterns (like x or ref x) and destructuring patterns (like (pat1, pat2) or Struct { x: pat1, y: pat2 }).

According to the definition defined in rfc 2005:

A reference pattern is any pattern which can match a reference without coercion. Reference patterns include bindings, wildcards (_), consts of reference types, and patterns beginning with & or &mut. All other patterns are non-reference patterns.

  1. identifier patterns like x is binding, which is of type reference pattern. Why did you think it belongs to non-ref pattern?

  2. My algorithm is incomplete, it only covers the cases where we are matching non-ref against reference. So let's just talk about these cases. When encounters reference pattern, what will the compiler do? When the reference pattern is something like &pat, it will set default binding mode to move. What about the other situations where the reference pattern is not something like &pat

    if (encounter reference pattern) {
        if (the reference pattern is `&pat`) {
            default_bind_mode = move;
        } else {
     		// Other cases
            // what should we do now?
        }
    }
    

Yes. As far as I can tell, the default binding mode is present at every point in the matching process.

Apologies for the confusion. I was referring to a "reference pattern" in the sense of the Rust Reference, i.e., an & or &mut pattern. I'll proceed with the RFC terminology from now on.

When we aren't matching a non-reference pattern or &[mut] pattern against a reference, the default binding mode is passed through unchanged when we match the inner patterns. Therefore, it must be preserved until we finally reach an identifier pattern. Let DBM stand for "default binding mode". Then, the process can be summarized like this, where the initial DBM is move:

  1. If we are matching an identifier pattern, bind the identifier with ref [mut] binding if the pattern specifies it explicitly. If the pattern does not specify the binding mode explicitly, use the current DBM instead. If the pattern has an inner pattern (e.g., ident @ pat), match the inner pattern with the current DBM.
  2. If we are matching any other kind of reference pattern against a reference, dereference the value, and match the inner pattern with the move DBM.
  3. If we are matching a non-reference pattern against a non-reference, destructure the value, and match the inner patterns with the current DBM.
  4. If we are matching a non-reference pattern against an & reference, destructure the value, and match the inner patterns with the ref DBM.
  5. If we are matching a non-reference pattern against a &mut reference, destructure the value. Match the inner patterns with the ref DBM if the current DBM is ref, or with the ref mut DBM otherwise.

This covers all reference and non-reference patterns.

1 Like

When we aren't matching a non-reference pattern or &[mut] pattern against a reference, the default binding mode is passed through unchanged when we match the inner patterns.

So there are actually two situations where we will change the DBD, one is matching non-reference pattern against reference, and the other is matching &[mut] pat against reference.

matching against change to DBM
non-ref pattern reference set to ref or ref mut
&[mut] pat reference set to move

If we think using the terminology from RFC, seems that the situation can be roughly divided into 4(2^2) categories:

  1. Matching non-reference pattern against reference

    This corresponds to your cases 4 and 5, which are the cases where match ergonomics participating in.

  2. Matching non-reference pattern against non-reference

    This corresponds to your case 3, which is just a simple destructuring process.

  3. Matching reference pattern against reference

  4. Matching reference pattern against reference

For my cases 3 and 4, I can not easily map them to your cases. Thanks to you for accommodating me (thinking and rewriting the algorithm using rfc terminologies).

And you made me realize one thing thinking in this way is wrong. I begin to understand why the RFC doc does not cover cases where matching reference pattern against reference cause this belongs to other matching cases.

illustration

This is weird, seems that rust forum does not support picture. So I changed it to a simple link address

Let's use algorithm version 1 to denote this, and algorithm version 2 to denote this.

So let's go back to your algorithm version 1:

Here, I split "non-reference patterns" into identifier patterns (like x or ref x) and destructuring patterns (like (pat1, pat2) or Struct { x: pat1, y: pat2 }).

"non-reference patterns" are the patterns except reference pattern in all there patterns, right?

And the matching is a recursive process, for each procedure in this recursion, we have 3 situations depending on the pattern we are encountering:

enum Pattern{
    Identifier,
    Reference,
    // other patterns
    Range,
    Literal,
    Wilcard,
    Struct,
    TupleStruct,
    Tuple,
    Grouped,
    Slice,
    Path,
    MacroInvocation,
}

fn matching(pat: Option<Pattern>) {
    // If pat is None, stop the recursion
    // meaning the whole matching task is done
    if let Some(pat) = pat {
    	match pat {
        	identifier_pattern => {...; matching(inner_pattern); },
        	reference_pattern => {...; matching(inner_pattern); },
        	// any other destruct
        	_ => {...; matching(inner_pattern); },            
    	}   
    }
}

Still have some questions about algorithm version 1, sorry about this:(

  1. In the second case (if the pattern is a &[mut] reference pattern), what is Dereference the scrutinee expression, and what is scrutinee expression?

  2. Also in the second case, Ensure that we are matching against a &[mut] reference, what if we are not matching against a reference, triggering a compile error? like let &a = 2?

  3. Case 3:

    Ensure that we are matching against a T value or &[mut] T reference.

    • If we are matching against a &T reference: ...

    • Otherwise, if we are matching against a &mut T reference: ...

    Where is the case matching against a T value, what should we do? I guess it should be a simple destructuring process, like this one in your algorithm version 2:

    1. If we are matching a non-reference pattern against a non-reference, destructure the value, and match the inner patterns with the current DBM.

    And when you say T, are you referring to the type except [mut] reference, because in generic rust, T does not only contain owned type but can also represent [mut] reference, reference.

And a question about algorithm version 2:

  1. Where is the case matching reference pattern(RFC terminology) against reference , I don't see any cases in algorithm version 2 setting DBM back to move

    I kind of think perhaps this question is hard to explain cause thinking using terminology from RFC is wrong and seems that there is not a simple mapping between the cases in algorithm version 1 and algorithm version 2.

I think you are misunderstanding the description of the algorithm and/or where you have to consider the references for the sake of stopping. Only references in the pattern influence when to stop. So let me elaborate the two cases:

  1. Case 1:

    1. (a, b) is matched against a type &(_, _) therefore the reference-to-tuple is dereferenced and the binding mode is set to ref.
    2. Therefore, the above is equivalent to
     let (ref a, ref b) = *&(&String, &String);
     // which in turn is equivalent to
     let (ref a, ref b) = (&String, &String);
     // which in turn is equivalent to
     let (a, b) = (&&String, &&String);
    
    1. There are no more references to match against in the pattern. It doesn't matter that the components of the tuple are still references themselves (and that their referred type is yet another level of reference). The innermost patterns are now reached and they are simple identifiers which unconditionally bind anything successfully, including references. Therefore, we are done and the algorithm terminates.
  2. Case 2:

    1. (&a, &b) is matched against a type &(_, _) therefore the reference-to-tuple is dereferenced and the binding mode is set to ref.
    2. Therefore, the above is equivalent to
     let (&ref a, &ref b) = *&(&String, &String);
     // which in turn is equivalent to
     let (&ref a, &ref b) = (&String, &String);
    
    1. The compiler now recurses into the fields of the tuple, and finds yet another level of reference in the pattern. It can see that reference types are matched against reference patterns, so it resets the binding mode to move, i.e. now we have:
     let (&a, &b) = (&String, &String);
    

    which in turn is the same as

    let (a, b) = (String, String);
    

    except that it is disallowed (i.e., compile error), because the original String values were behind (multiple layers of) references, and actually performing this binding would move the values out of the reference.

3 Likes

You are right. I am emphasizing match ergonomics too much so that I fall into it and leaving the fact that
the cases covered by match ergonomics is a subset the whole matching cases all behind, trying to think the whole matching process in the perspective of match ergonomics, which is definitely wrong.

And I didn't know matching reference pattern against a reference will reset the default binding mode to move, is this operation documented anywhere? Thanks:)

You yourself quoted something in the question that contains this fact.

Really sorry about this, but the official docs quoted by me are just these two pages:

  1. 2005-match-ergonomics
    On this page, if you search default binding mode(case insensitive), you will get 16 matches.
  2. patterns
    On this page, searching default binding mode(case insensitive) will give you 4 matches.

I just looked through these matches, seems that none of them are talking about match reference pattern against reference. Are you referring to this issue and this source code?

I don't think the audience of these two things is an average rust user. If I am a compiler developer, I definitely should check them out.

RFC 2005 doesn't talk about matching &[mut] patterns against references, since it's simply not relevant to match ergonomics. The algorithm in RFC 2005 is partial: since it only applies to matching non-reference patterns against references, the RFC's algorithm is understood to reset to move every time a reference pattern (including a &[mut] pattern) is encountered. The examples make it clear, though: patterns using the RFC's ref [mut] DBMs are equivalent to &[mut] patterns containing explicit ref [mut] binders. It could probably have stated more explicitly how the DBM resets, but there's nothing we can do about it now. Now that I look at it, though, the Reference's explanation of the DBM contains a similar omission; maybe that could be clarified.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.