Why does `match` allow arbitrary identifiers?

I am confused that the following code snippet is valid Rust. The compiler issues a warning about "unreachable patterns", but shouldn't this be a compiler error? IMHO this is especially bad as there are also no compiler errors about missing cases.

enum Foo {
    A, B, C, D
}

impl Foo {
    pub fn print(&self) {
        use Foo::*;
        match self {
            A => println!("A"),
            B => println!("B"),
            Yolo => println!("yolo"), // ???
            HurrayThisIsFunny => println!("nope"), // ???
        }
    }
}

fn main() {
    Foo::D.print(); // Quiz: What does it print?
}

Imagine you have an enum and rename the members in the definition, but forget to update them in the match statement. The code will keep compiling although there is now missing cases and unicorns happen.

I mean, it's not just one warning the compiler is giving you:

warning: unreachable pattern
warning: unused variable: `Yolo`
warning: unused variable: `HurrayThisIsFunny`
warning: variable `Yolo` should have a snake case name
warning: variable `HurrayThisIsFunny` should have a snake case name

It allows "arbitrary identifiers" because match does pattern matching. For example:

#[derive(Debug)]
enum Foo {
    A, B, C(String), D
}

impl Foo {
    pub fn print(&self) {
        use Foo::*;
        match self {
            A => println!("A"),
            B => println!("B"),
            C(s) => println!("C: `{s}`"),
            other => println!("something else ({other:?})"),
        }
    }
}

fn main() {
    Foo::D.print(); // Quiz: What does it print?
}

Output:

something else (D)

There's more about this in the section of the book on match. Suffice to say: not allowing match to bind to values in patterns would make it significantly less useful, and the language would be a lot worse for it.

8 Likes

If you didn't import the variant names, using an identifier that matches a variant name is a default-fatal lint. That's new as of quite recently.

You can deny unreachable patterns too, if you want.

13 Likes

Denying the unreachable patterns is certainly good for cases with multiple arms, but it would not help if there is only a single arm which is misspelled.

Well, in this case you're likely to hit the "unused variable" warning anyway.

Using #![deny(non_snake_case)] on crate level seems to be the least intrusive solution to this problem.

4 Likes

You also won't get this kind of error if you don't import variants of your enum, and instead name them via the full path. You can still get the bug with matching against global consts, but it's not that common anyway.

4 Likes

When avoiding the glob import, you can still shorten the match statements by binding Foo to something like E with either use Foo as E;, or type E = Foo; then refer to them as E::A etc.

6 Likes

Yes, but you could have an unambiguous grammar for this.

The current grammar is unfortunately ambiguous and the ambiguity is resolved based on context.

1 Like

Hmm... true. I suppose this could be resolved by adding a lint to clippy that requires all binding to be done like name @ _.

That said, I don't see it as an issue in practice because you'll get a warning if variables aren't snake_case, and the "literal-ish" things are either PascalCase or SCREAMING_CASE. It's icky in the same way having visibility be determined by case is, but it works.

There could be, yes.

Note that this exact problem has been known about for at least 32 years (https://www.cs.princeton.edu/~appel/papers/critique.pdf p12), but patterns still work this way in many languages. I think that's at least a partial indication that it's just not that serious of a problem, especially since rustc tends to fire at least 3 lints about it when it happens.

2 Likes

There is probably some tradeoff between bulletproof and convenience. It does come as a surprise though that something which used to be (some kind of) type suddenly changes its role into a variable name and everything keeps compiling fine, but output changes. Coming from the C++ world of foot guns this is probably the first one I encountered in safe Rust.. Relying only on lints is a bit dangerous especially as I find the "unused variable" often violated when brining up new software. I tend to have quite a few enums and match statements in my code and "use ::*" helps a lot with readability especially when matching with two or more variables.

To name one example of other language I’d know: Haskell. They don’t have the problem because variable names cannot be upper-case (and constructors of data types [these are like Rust enums] cannot be lower-case) it’s a hard error to ignore the distinction.

If you’re worried of seeing the snake_case lint too late in development (due to “unused”) lints, the solution of upgrading the snake_case lint (deny(non_snake_case), or maybe just deny(nonstandard_style) in its entirety) seems quite appropriate. Downgrading “unused” lints would be the alternative to separate them, but since outright disabling “unused” lints has the risk of forgetting to turn it back on again, maybe we should work on multiple warning levels with different severity… or maybe make “snake_case” lint in patterns a deny-by-default lint :thinking:

5 Likes

It's a similar pitfall in Scala. I adopted the habit of always using qualified names when matching constants, even if that means introducing a module just for that., e.g.

mod keys {
  pub(crate) const A: &str = "a";
  pub(crate) const B: &str = "b";
}

match string {
  keys::A => { ... }
  keys::B => { ... }
  _ => { ... }
}
5 Likes

I always use paths because of this issue. Sometimes I do "use some_crate::LongEnumName as L;" and then "L::Variant" just to avoid this exact issue.

Enforcing "name @ _" instead of the ambiguity of today would be great.

2 Likes

One additional thing to note is that the ambiguity isn't just for top-level patterns, but also subpatterns, e.g. if you have a match pattern of Enum::Variant(CONST). Requiring name @ _ at top-level seems reasonably practical enough, but for subpattern bindings becomes more impractical. Merely doing so for non_snake_case bindings seems much more reasonable, and a shoe-in fit for a clippy restriction lint (i.e. allow by default).

Perhaps surprisingly, the exact same ambiguity holds for standard let as well, although there it's a lot less likely to be an issue since the let pattern must be exhaustive and an if let pattern is a separate lint.

Pattern binding isn't ambiguous due to it, but it is highly implicit due to binding mode inference, of which const patterns can be considered another variant. (I wonder if the people against binding mode inference would also prefer using const ITEM?) Yet another place where it's evident that explicitness isn't a goal of Rust, but rather locality of reasoning[1].

The "real" trick is to be warnings clean before doing any refactoring, so you can see the compiler trying to point out the newly introduced issues. Compiler guided "fearless" refactoring is a huge benefit to Rust's type system, but due to the large number of concessions to practicality, it only really achieves "fearless" if all the default warnings are heard.

Since it's mentioned: an interesting concept I had was to have let be part of the pattern language, switching from pattern mode to binding mode, like how mut annotates the binding, not the pattern. The main issue then is that due to being near-duals, pattern and expression syntax would be ambiguous in $lhs = $rhs without the let introducer. Maybe you could resolve that by doing forward-order binding, e.g. $expr is $pat or even match $expr => $pat to maintain a left-aligned call-out? Something for future languages to think about.


  1. Of course, needing to know the impl signatures of involved types and in-scope traits to do name inference is still a global concern. Practicality generally beats principles (but always loses to (un)safety concerns). ↩︎

4 Likes

I suppose the same kind of name collision problem might be why crates like serde-derive use funnily underscored variable names for things like function arguments in trait implementations. Just watch as something simple as

use serde::Deserialize;

#[derive(Deserialize)]
struct Foo(String);

Rust Playground

breaks completely by merely introducing a questionably named constant

const __e: () = ();

Rust Playground

because the macro expansion features some method implementation

fn visit_newtype_struct<__E>(self, __e: __E) -> … where …

That’s not a let, but a function argument, which is arguably the same kind of thing (also using irrefutable patterns).


It also seems compiler-built-in derives have some magical special properties (or maybe just some “better” form of hygiene?) where they can avoid this issue. E.g. derive(Hash) features some

fn hash<__H: ::core::hash::Hasher>(&self, state: &mut __H) -> () 

according to cargo expand, but a const state: () = (); creates no issues.

2 Likes

... Oh wow, because consts are items, this breaks through macro_rules! ("mixed site" or "semitransparent") hygiene: [playground]

macro_rules! m {() => {
    let state = 0i32;
}}

const state: () = ();

fn main() {
    m!(); //~ error
}

The built in derives must be utilizing "def site" ("2.0" or "full") hygiene or equivalent (symbols not nameable in source) if they can't be broken this way. Proc macro derives are even more potentially vulnerable because they're rarely written using mixed site hygiene (support was added later and thus things default to "use site" (i.e. no) hygiene).

It would require a strange macro to cause unsoundness with this, as most of the time it would just be a type mismatch error, but it's possible in theory, e.g. a macro with match $e { None => panic!(), e => e.unwrap_unchecked() } plus const None: Option<T> = Some(T::new()), though item paths are more likely to be understood as an "attack" vector and rewritten to $crate:: paths than binding names are.

3 Likes