I'm not meaning to litigate this question, I'm merely curious and want to learn: why is it impossible for you to predict which panics will abort and which will not?
From reading Rust resources, I've been under the impression that it is always the final binary that decides this question by setting either panic="abort" or panic="unwind" in its Cargo.toml file.
I see where you’re coming from on this, as crashing a long-lived process can be a problem. However this is a very strict criteria which I suspect very few libraries meet, as any bounds check (such as slice indexing) may panic.
They are quite specific, indeed.
And indeed the code I work on foregoes the usage of index syntax in favor of the .get() and .get_mut() methods for that very reason (as a bonus, the methods seem to be faster, too).
All I'm saying is that putting the decision on whether or not to panic at all in the hands of the library consumer is not a bad thing. If a panic is appropriate, they can easily do so. If not, however, you at least have the option of doing something else.
Basically it comes down to "separate mechanism from policy, and let consumers choose a policy appropriate for themselves, rather than deciding for them".
While it's quite possible you're correct, the docs are kind of ambiguous on this point.
Of course there is also the issue that unwinding a stack can easily be expensive, so avoiding that when it isn't strictly necessary is just plain nice.
What I'm driving at this that this is precisely how it started there. And I won't make the same mistake.
You also didn't address the separation of mechanism and policy. Panics in a library still do nothing for a library consumer that regard. Even if a consumer can turn it off, it just serves as an obstacle to overcome, and a purely incidental (in the sense of incidental complexity) one at that.
Now I'm confused as to what you are wanting to happen in an error situation jjpe.
There aren't that many options when something that is not supposed to ever happen actually happens and is detected by the compiler as something it can't deal with or an assertion or such that is there because the author does not know how to deal with it.
The program crashes and dies completely and immediately. Hopefully with some meaningful error message.
The program tries to continue such that some higher level caller can handle the issue and do something else.
The latter requires either:
a) Passing an error return back up the call chain.
b) Triggering some kind of exception that unwinds the stack until it gets to whatever caller is catching the problems.
Given that we are talking about an error that should never happen and therefor error return values are not a nice idea that only leaves some exception mechanism.
What have I missed here?
Personally I prefer 1) Crash and burn. Something that should never happen has happened. Your program is now is some weird indeterminate state. It should be investigated and fixed.
I loath exceptions. They make spaghetti out of ones code.
Divide by zero is an archetypal example of library code that panics simply because the user violates the library's expectation. Panicking is done here because it performs well. (It could also be considered nothing more than ergonomic.) There are several great reasons to panic in library code, so I would suggest considering a more broad perspective on the matter.
There are also security concerns. Returning a Result implies an anticipated error occurred, which is quite different from an unanticipated bug. As a consequence you may take action on something that is in an inconsistent state. Enforcing "death before confusion" avoids entire classes of vulnerabilities.
@skysch@alanhkarp you both make a valid point. But all that does is reaffirm that programming is more art than most people give it credit for, including me sometimes.
The answers are unfortunately also completely unactionable for me in this case, given my other constraints.
Is a panic substantively different from a server crash? In both cases, your program (or at least that instance of it) is off the air. You can't avoid the latter, so you should be able to deal with the former.
I'm curious to know what your programs do when these methods fail. I can understand if you're writing server-like code that manages multiple threads or operations, that maybe you can defer or wait for things to change before trying again, but the vast majority of code is just going to start propagating an error up the call stack until they hit a point where they can notify the user that things went awry. In which case, you're just unwinding manually. The catch_unwind technique was basically invented to handle the case when you need to survive running arbitrary code, so the only thing that really remains is that you're writing very special purpose code that is understood by a larger application, and that starts to look a lot less like a library per se and more like an application with its own special way of multiplexing operations and responding to errors.
To be fair, jjpe did mention "transactions". Presumably there may be thousands of them going on every second. Presumably some of them fail, due to network errors or whatever. Those kind of errors are expected and should not bring the service down, they can likely be handled by retrying the transaction. At least one totally failed and aborted transaction should not crash the whole system.
But what to do when a transaction fails due to an actual bug in some subsystem that causes an event that should never happen by design?
I'm not sure but I think solutions to handle that one failed transaction whist the rest continue have been offered above. My Rust foo fails me at this point.
A big question then is: Is it even reasonable to continue with all those other transactions in the air? Can we assume the damage done by that bug is isolated to that one transaction?
Certainly if it is in some unsafe code or a result of calling some C library one cannot. With such a bug one has no idea what memory has been corrupted where.
I'm still in favor of "death before confusion". Unless such transaction isolation and failure containment can be proven.
In my non-expert opinion, until you know exactly why the failure occurred (and have a fix developed,) you're taking a risk in assuming that someone hasn't penetrated your security and isn't manipulating your program code, and if that is possible, you shouldn't trust any transactions, even if there are no errors in them.
What are your constraints? Does it have something to do with drop code not being run, which might lead to some kind of memory leak or other resources not being freed, which causes problems, is it about performance, is it about logging or something else entirely?
Do I understand you correctly, that you'd like to have the assert macros return a result, possibly with some kind of PanicError containing the error message, rather than panicking directly?
Or, at the cost of added complexity, break down the application into multiple intercommunicating processes and have a supervisor handle crashes (the Erlang way, but harder).
I guess, one would have to differentiate between 4 types of function return values:
Neither exceptional nor regular error
Exceptional, but no regular error
No exceptional, but regular error
Both exceptional and regular error
How would one design this?
A ComplexResultenum with cases Ok, Err and Exc only makes sense for functions which return both regular and exceptional errors, but not if only one kind of error can occur. We also lose the ability to use ?-operator.
Using a simple Result for both cases where functions exclusively return either exceptional or regular errors might be an easy way to confuse people when handling errors.
I don't see a good way of handling this in current Rust. Maybe, if it was possible to have the ComplexResult work with ? for both regular and exceptional error cases and specifying ComplexResult<OkType, !, ExceptionType> and ComplexResult<OkType, ErrorType, !> would be possible and result in efficient code, I could see this work out nicely.