What's the trick of this example?

The point is that the grammar of patterns (just like those of expressions and types) is recursive. So a | b is a pattern itself if a and b are. There are many other examples of recursion in the pattern grammar which has nothing to do with alternation, e.g.: references: &P, tuples: (P, Q), structs: S { field: P }, and the list goes on. These aren't ambiguous, either – patterns being made up of other patterns does not in itself cause any ambiguity.

And since | only ever appears at one single place (production) in the grammar, it's unambiguous. It doesn't matter that | foo | bar is a pattern itself, it can still be parsed in only one, unique, valid way.


I will stop replying to this thread because I don't know how to explain this in any further detail. If you don't understand grammars and recursion, you should maybe learn more about those first, because this is not productive.

1 Like

I was wrong. I meant the entire | _ | Some(1) is a pattern.

But not the atomic one - it's consisting of two patterns, _ and Some(1) (which, on the other hand, can't be deconstructed further).

Yes, I know that. I meant, in practice code, the non-atomic pattern | _ | Some(1) can result in the ambiguity between it and a closure | _ | Some(1). I just ask why we introduce a leading | in a pattern that would be hard to distinguish. I just don't figure out an example that must need a leading | to form a pattern. I just think a | b | c is good enough and is very easy to know it is a pattern.

There's no one that must.

1 Like

So, there is no necessity to introduce a leading | in a pattern that would make ambiguity between a pattern and a closure, Isn't it? I don't see the purpose of why we introduced it in edition 2015.

The reason is stated above. It's for easier macro writing.

And there's no necessity to anything, you can always write raw code.

1 Like

For macro writing, what's the concrete case in which a leading | is easier than that if we do not have that |?

When I'm building the match arm based on one or more inputs.

Without a leading |, I need to special case the first item in the input list, and output $item for that, and | $item for all other items in my input list.

With leading |, I can output | $item for every match arm, not caring about the first or last item. This would also work if Rust allowed trailing | in patterns - I could output $item | for every input item.

6 Likes

I see that

macro_rules! string_pat{
    ($($pat:pat_param)| +)=>{
       $(println!("| {x}",x = stringify!($pat));)+
    }
}

macro_rules! string_pat_no_leading{
    ($($pat:pat_param)| +)=>{
       $(println!("{x} |",x = stringify!($pat));)+
    }
}
fn main() {
   string_pat!(a|b|c);  // output: |a | b | c
   string_pat_no_leading!(a| b| c); // output: a | b | c |, which is not a valid pattern
}

The point on parsing non-ambiguity was that, prior to the stabilization, no pattern could start with a |. So there's no parsing ambiguity. After stabilization, a pattern that starts with | just swallows the leading |. [1]

Look, you're not alone in disliking the addition -- just read the PR or tracking issue or stabilization issue. But if you do, you'll also see the experts on syntax sign off on the implementation, and see team members who are ambivalent or even dislike it sign off on it as well. However you'll also see people who like it and proclaim they'll definitely use it. [2] Ultimately, for better or worse, it stabilized and there's just not enough motivation to remove it over an edition after 4+ years. Even if that happened, it would still have to be parsed and supported in every earlier edition. So what are you trying to accomplish?

If your goal is just to find support for "well that sucks, shouldn't have been accepted", I think you'll be in decent company, based on the RFC/PR/FCP comments. But all languages have warts. I personally feel Rust has much worse warts than this one. And I also don't think it's going away, so :person_shrugging:. [3]


  1. N.b. I have no idea if this is the actual implementation -- but the point is, there is no ambiguity. Only code that couldn't compile before had a leading |; no code that compiled before changed meaning. ↩ī¸Ž

  2. Despite the macro arguments of the RFC, if you read the conversation, ergonomics seems to be the main actual factor in the decision-making. ↩ī¸Ž

  3. If you want to see a bikeshed that people care an order of magnitude more about, look up the push to eliminate the turbofish. Spoilers, it was a massive push, but it failed -- hence the bastion of the turbofish. I don't think anywhere near enough people care about this one to go anywhere. Such pushes also tend to stir up more sour grapes than anything productive, IMO. ↩ī¸Ž

8 Likes

Right, and now try and write string_pat_no_edges!, which works like your existing two, but outputs a | b | c, without either a leading or trailing |.

This is why the trick exists - because writing string_pat_no_edges! is much harder than writing either string_pat! or string_pat_no_leading!.

And then, of the two options, a leading | that's supposed to be preceded by another pattern is a mistake a human is unlikely to write. A trailing |, then forgetting to put in the rest of the pattern is a plausible mistake.

We thus get to three options:

  1. Make it hard to write macros that output variable alternate count patterns, because you have to somehow treat either the last or the first item in the pattern specially.
  2. Allow a trailing |, which means it's impossible to distinguish "human typed Ok(MyEnum::Variant1) |, got distracted by doorbell/phone call, and did not remember to type Ok(MyEnum::Variant2) at the end of the pattern. This is also a case that's non-obvious in code review - A | B | C | D | is not blatantly bad, especially if it's formatted in a multi-line format with the | aligning, and if you also have an exhaustive arm in your match (e.g. x => or _ =>), you won't get warned that you missed E off the end of that list.
  3. Allow a leading |, which means that |a| Some(a) is a valid pattern that looks like closure syntax. However, this is usually something that warnings (the "unreachable pattern" warning that @scottmcm mentioned) will indicate.

All three options are imperfect, and a consequence of using |<ARGS>| as closure syntax while also using | for "or" and "alternation". Given the three options, though, option 2 (trailing |) is clearly worse - there's plausible cases where it would be typed in error, and no automation would warn you that you've made a mistake.

This leaves us with option 1 (hard to write macros that output variable length alternation patterns) or option 3 (allow leading |). Rust has chosen option 3 because it's very unlikely to be written by a human accidentally, and if they do write it, there's likely to be compiler warnings telling them that this isn't closure syntax, it's pattern syntax, which means that the mistake is unlikely to survive code review and CI, even if it builds and runs.

4 Likes

By looking the source code of matches, it use no trick but just a simple expansion, the simplified implementation is:

macro_rules! matches{
     ($expression:expr, $( $pattern:pat_param )|+) => {
          match $expression {
              $( $pattern )|+ => true,
              _ => false
          }
     }
}
fn main(){
    matches!(Some(1), _ | Some(_));
}

It just can work. So, I still not seeing the case where a leading | is necessary.

matches is a simple macro - it's just duplicating your supplied pattern into the match block it outputs. Complexity kicks in when you're not accepting a pattern as input, but you're creating the pattern based on the arguments to the macro.

2 Likes

Could you please give a simplified example in which we should take care of either the first or the last item in the pattern? That would be helpful for interpreting what problem we introduced a leading | to solve.

This is oversimplified compared to real code, but gives you the idea.

We have a set of messages we want to send, and the input to the macro is a list, not of $pat iterms, but of $ty items, representing the types of the various messages. We're building a match statement whose patterns are of the form concat_idents!($type, "Confirmation"), turning the sent messages into the confirmation replies we expect to get back - we're pattern matching those replies, and similarly for the reject messages.

So the macro turns an input like messages!(Hello, Goodbye) into a match block with three arms: Hello | Goodbye | PayMoney for the send side, HelloConfirmation | GoodbyeConfirmation for the received OK side, and HelloRejected | GoodbyeRejected for the message failed to send cleanly side.

For added fun, some types have exception cases in the macro. So we want to pick up that you've given us messages!(Hello, Goodbye, HowAreYou), and (e.g.) not generate a HowAreYouRejected in the rejected match, since we know that HowAreYouRejected will not exist.

The resulting set of macros is complex; one allows you to convert a message type (like Hello) into its Confirm counterpart (like HelloConfirm), and results in nothing if there's no Confirm counterpart. We then use this to build the list of patterns, being careful not to output a | for empty parts.

This is exactly the same as why it's allowed to write a function call as

foo(
    some_long + thing.goes_here(),
    and_more.stuff() - happens,
)

with an "unnecessary" comma after the last argument.

Allowing that makes various things easier. It's long been known as preferring terminators to separators. It's made it into rustfmt's Overarching Guidelines.

I'm not 100% confident that leading | here is better than trailing |, but I'm 100% confident that exactly one of those is better than neither (and better than allowing both).

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.