Could the logic of determining exhaustiveness of matches for enum variants be improved?

Compiler Explorer with compilation error here: https://rust.godbolt.org/z/7WsWgL

Couldn't the compiler be smarter about it, and not require the commented line with _ => return 0 in the second match?

Thanks, source also below for convenience:

pub fn main() -> i32 {
    enum Variants {
        Variant1,
        Variant2,
        Variant3
    }
    let v1 = Variants::Variant2;
    
    match v1 {
        Variants::Variant1 => return 1,
        _ => {
            for i in 0..10 {
                match v1 {
                    Variants::Variant2 => return i + 2,
                    Variants::Variant3 => return i + 3,
                    //_ => return 0
                }
            }
            return -1
        }
    }
}

In what way would that be smarter?

I'm new to all this but my understanding is that requiring a match on "_" is there deliberately to force the programmer to consider all cases.

Typically, I think, "_" is not used and all cases are matched. Then when the code gets updated and the enum grows another variant, say "Variant4", the compilation fails in all the matches that do not cover the new case. This is a good thing because you now know where you have to modify all your matches, rather than things failing silently or crashing.

Removing the requirement to cater for all cases would not be so smart.

1 Like

I believe @Mr-Slippery is suggesting that the fact that the outer match block has a Variant1 arm should inform the exhaustiveness checking of the inner match block, so that it would be satisfied with just Variant2 and Variant3 arms. When considered together, the two match blocks are considering all cases.

I guess the more productive question to ask would be: Do you have a more realistic example? The example given is obviously a bit silly since only one iteration of the for loop takes place, so you might as well remove the loop and merge the two matches into one. Since this enhancement would only apply when two match blocks match on the same expression in nearly the same scope, I'm struggling to think of a case where you couldn't easily merge the match blocks.

3 Likes

I get the idea.

I don't see how making the analysis more complex to cater for a degenerate case like that is worth the bother.

Think of it like this: The error in this case is hinting that one should not write such convoluted code :slight_smile:

2 Likes

Indeed, that is what I am referring to.

Regarding a more realistic example, the structure comes from a program that deals with 3 kinds of fractals, a Mandelbrot, Julia, and Buddhabrot, represented by enum variants. While the Buddhabrot case needs separate treatment, because of the way it needs to be plotted, the code for Mandelbrot and Julia has the same structure: two nested for loops over all pixels, conversion to Complex numbers, and calling the Mandelbrot/Julia iteration function. So, the difference between Mandelbrot and Julia is needed only at some point in the nested for loops, where Buddhabrot is separated from the start. I abstracted away the details and left that one for loop in to suggest the distinction between Variant2 and Variant3 is only needed at some point in some common, potentially complicated, control flow.

Consider also that if we uncomment // _ => return 0 that is really just dead code, which one is forced to write for no reason whatsoever.

I understand the reason for checking exhaustiveness, I am just asking if it couldn't be smarter and figure out that the checking is exhaustive in this case, and not require me to write dead code.

Of course, if one is able to extract the common control flow corresponding to Variant2 and Variant3 into a separate function, I suppose there would be no issues. Once borrowing of objects that are part of the control flow is solved (in my real case with the fractals: canvas etc.) that would solve it I guess.

OK, managed to refactor things, and it looks like this now.
If not for the truckload of arguments to render_mandel, it wouldn't be too bad.
Thanks for your feedback, helped me think of this refactoring, cheers!

    match fractal_type {
        FractalType::Buddhabrot => {
            
        }
        FractalType::Mandelbrot | FractalType::Julia => 
            render_mandel(d_x, d_y,
                          min, max,
                          max_it,
                          fractal_type,
                          color_scheme,
                          & mut canvas)
    }

In general this seems like flow typing, i.e. control-flow-sensitive typing, that a few (certainly not mainstream) languages like Ceylon have adapted. For example, one might also expect this to work:

if some_option.is_some() {
    /// here `some_option` is known to contain `Some`
    some_option.call_method_on_inner();
}

At first sight it feels like a neat idea, but I have yet to encounter a case when the lack of flow typing prevents me from writing elegant, naturally-flowing code in Rust. In fact, I think this is precisely because the pattern matching capabilities of Rust (in particular on enums) are already pretty smart.

Another thing this might be hard to implement or not worth the tradeoff is that it introduces the need for global analysis in the code, because now type checking a match on a value depends on all the outer matches recursively containing it. I'm no computer scientist, but I've done my fair share of work on compilers, and the usual consequence of global analysis is exponential (theoretically speaking) or if not, still at least significantly slower (practically speaking) compilation times.

Such a system also comes with a whole other set of problems; more often than not, I find it disproportionately harder to reason about such code, exactly because it needs global analysis, that is, the limited immediate context I can hold in my head isn't sufficient for determining what a piece of code does or whether it is correct.

5 Likes

There is a more manual option of making the subsets yourself,

struct Buddhabrot;
enum MandelbrotOrJulia {
    Mandelbrot,
    Julia,
}

impl TryFrom<FractalType> for MandelbrotOrJulia {
    type Error = Buddhabrot; // ...
}

// ... 

match MandelbrotOrJulia::try_from(fractal_type) {
        Err(Buddhabrot) => {
            
        }
        Ok(fractal_type) =>
            render_mandel(d_x, d_y,
                          min, max,
                          max_it,
                          fractal_type,
                          color_scheme,
                          & mut canvas)
    }

But this doesn't scale :frowning:

1 Like

I agree with all of your statements, it would definitely be more difficult to analyse for the compiler, however, if the compiler would be smart enough to check things, one may not need to understand the whole control-flow at once.

Also, currently, even with the "workaround" of refactoring the common part into a function that I mention above, still, inside that function I am forced by the compiler to write unreachable code:

...
match fractal_type {
    FractalType::Mandelbrot => m = mandel.iter(Complex::new(0.0, 0.0), c),
    FractalType::Julia => m = mandel.iter(c, Complex::new(-0.70, -0.33)),
    _ => m = 0 // unreachable (if Buddhabrot this fn is not called)
}
...

However I try to turn this around, this doesn't look good. Yes, perhaps the logic would be 1) hard to implement, and 2) would impact efficiency. But, to be pragmatic, 1) hard to implement is not my problem, and in this day and age where we have things like KLEE that are able to prove theorems about many kinds of C/C++ programs, not sure this would be so tough to implement and 2) may be something to consider.

But bottom-line: the compiler of a language that wants me to write CRaP code should not force me to write dead code, I still see this as a problem. If the compiler can't properly analyze the nested matches, even with 2-3 levels of nesting, then demote this error to a warning, which I can maybe suppress, for example, instead of having to write _ => m = 0 or _ => m = 12311341 for no reason at all?

Thanks for your input, even if it doesn't scale, I've learned something from it, I'd seen TryFrom before but never used it myself, still very much a beginner.

By the way, in this case the simple unreachable!() would be much more descriptive. Or, if you're really sure (although I won't recommend it even in this case), unsafe { std::hint::unreachable_unchecked(); }.

3 Likes

Thanks a lot @Cerber-Ursi, very nice hint, didn't know about unreachable! Looks better and I am telling the compiler how I see things, that's great.
Credited here: https://github.com/Mr-Slippery/rustmandel/commit/7e4c4d5118dbc50ed32961fffce0d71ea63ae375

I don't follow that. A human reader of the code has to obey the same rules and perform the same analyses as the compiler if s/he wants to end up with a conclusion consistent with the actual behavior of the compiled program.

Yes, but if for example it slows down compilation significantly, now you forced everyone to suffer because of a niche situation.

Assuming you are referring to the aptly-named coverage-vs-complexity index, what's wrong with the refactoring that transforms nested matches and loops into a single match and a function call? Looks way more testable and easier to understand to me.

See, we will never have the anecdotal "sufficiently smart" compiler. Turing and Gödel doesn't let it exist. I bet we can almost always find more and more special cases in the language that seem obviously valid yet it's not possible to prove them correct using the existing rules. We could try to keep adding these as exceptions to the language without end, but no good stems from that. I just have to point to C++, since you have set it as an example. As Scott Meyers once said in a presentation, "there's epicycle on epicycle on epicylce inside this language", because people kept justifying the need for more and more special cases, and they kept resolving the conflicts it yielded using… even more special cases. And it empirically turned out to be a terrible way to design a language.

6 Likes

I don't follow that. A human reader of the code has to obey the same rules and perform the same analyses as the compiler if s/he wants to end up with a conclusion consistent with the actual behavior of the compiled program.

Yes, definitely, but the human doesn't have to keep it all in mind at once if the compiler does its job. We don't have to keep in mind the entire control-flow of the Microsoft Excel code, because compilers and code analysers together with the human design help us with it.

Yes, but if for example it slows down compilation significantly, now you forced everyone to suffer because of a niche situation.

Hence this question --- "Could the logic ...?". If, as you say "for example" it turns out to slow down things too much, of course, the answer to my question is "no". But I would like to understand if it is so, more on that below.

Assuming you are referring to the aptly-named coverage-vs-complexity index, what's wrong with the refactoring that transforms nested matches and loops into a single match and a function call? Looks way more testable and easier to understand to me.

Sure, I am quite happy about that too, but please note that inside that function I still have dead code, imposed by the compiler upon me --- when it could be just a warning, in case it's not feasible to compute the non-exhaustiveness of matches.

See, we will never have the anecdotal "sufficiently smart" compiler. Turing and Gödel doesn't let it exist. I bet we can almost always find more and more special cases in the language that seem obviously valid yet it's not possible to prove them correct using the existing rules.

Yes, of course, but I am not convinced that analysing nested matches, at least for a single non-mutable variable is anything like the halting problem, but please feel free to prove me wrong.

[...] And it empirically turned out to be a terrible way to design a language.

I agree with all your other statements regarding too many features inside the design of the language etc., and I am not asking for crazy features here, these nested matches can arise naturally when there is common control flow like I suggest above. I am asking for something like: for 2-3 levels of nested matches on the same non-mutable variable, have checking ability, and throw errors if problems are detected. And for more, throw warnings, how about that? Rather than, again, forcing me to write unreachable code!

In general, this will require something like refinement types in order to work.

Sure, and I even wrote in one of my previous comments that it can be appealing. It's not that this is crazy by any measure.

But even putting aside my aversion to globality, my point is that 99.9% of non-crazy and individually entirely reasonable language features should be rejected, because they are just not worth the trade-off of accumulating a lot of complexity for some minor convenience of an individual.

The feature you proposed is not crazy and not an unequivocally bad idea on its own. It has its merits and its downsides. Actually, it is one of the less radical and – to me – more acceptable/reasonable proposals.

The problem is that people make tons of similar feature requests daily. People can differ in a million little details regarding what kind of style they are exactly programming in. And thus, it's extremely easy to come up with a feature that is reasonable in separation, and genuinely useful for solving someone's problem they actually encountered. But I'd like to argue that the bar for inclusion in the language should be much, much higher than "it can be useful", simply because of the sheer amount of possible features and feature requests. It's just not reasonably possible to implement all features that somewhere, at some point, were or could be useful for someone, because that would lead to a complete mess.

In other words, advantages and the general, widespread usefulness of a feature needs to far exceed its possible disadvantages, where said disadvantages should be considered not only individually, but also in the context of the entire language.

Therefore, coming up with a feature request that gets accepted is hard, and it should be hard.

3 Likes

Agree with everything, @H2CO3, I wouldn't say this is something essential, more like a nice-to-have-if-not-too-hard, with _ => unreachable!() mentioned above by @Cerber-Ursi, I guess things are no longer that bad already.
But it was the first "ugly" corner of Rust I've found, and it annoyed me a bit just because I love almost everything else about it until now.
I wanted to ask this to see if I am doing matches wrong, perhaps there is a better idiom I am unaware of, as I am still very much a beginner to Rust. Also, to see how the community reacts to such questions, and how receptive people are to proposals. And I am quite happy with the feedback, so thanks everyone again, the question is pretty much answered at this point from my perspective. Have a good one!

1 Like

@RustyYato, I am not deep into refinement types, I guess I get the gist of it, but am not sure why they would be required for my limited scope (2-3 nested level of matches for the same non-mutable variable => verification + errors if the case, for more levels => warning if not all values are treated). Again, I am new to Rust, I am most likely missing something there, so please elaborate if you wish, I'd be keen to learn something new, and thanks!

I guess for such a limited scope I guess we don't need to pull in all of refinement types, but this sounds like a feature where if people get a taste for it, they will ask for more until we get full blown refinement types. (This is also just a slippery slope argument...).

If we want to allow mutations that would require some level of refinement types, for example

*x = Some(value);
let Some(ref mut value) = *x;

But that looks like it is out of scope. Either way, I don't really see the need for this feature if we don't allow mutations because you can always use subset like I showed above. Even if it doesn't scale, I don't think that it needs to for such a niche use case.

2 Likes