Is error handling in Rust all about when you can and can't afford to return a Result<T,E> instance?

Hi folks, I'm new to Rust and am currently working my way through the book.
It feels like the decision of whether to use panic! or use Result<T,E> is kind of arbitrary. Any place where the panic! macro ends up being called, like trying to divide by zero or trying to access an index out of bounds, we could have returned a Result<T,E> instance instead and left the error handling responsibility to the user (the calling code). In the case of accessing an index out of bounds in a vector, why not have normal indexing code like &v[100] behave just like v.get(100) for a vector v of length 5. I reckon that returning a Result<T,E> in all situations would make the code a bit cumbersome with match expressions everywhere, but it feels like there's no clear line between the kind of scenarios that warrant panicking and those that warrant returning a Result<T,E> instance.
I've tried to find the answer to this on the internet but everywhere it states some version of - if a piece of code takes your system into an invalid state, or if it causes an unrecoverable error, you should panic. But what error is unrecoverable exactly? Accessing an index out of bounds or dividing by zero isn't unrecoverable by any means, we can just return Result<T,E> and leave the recovering responsibility to the user. The match expression is going to make sure that the user deals with all the possibilities, especially that of an error.

What concerns does this error handling decision boil down to? Does it boil down to pragmatic decisions like:
How rare is the error? Is it frequent (normal) enough to warrant changing the return type of an operation from a simple type T to Result<T,E> where you have to do extra steps to simply get out the data? How cluttered with match expressions do we want our code to be?

1 Like

How cluttered with match expressions do we want our code to be

You could use the ? operator to propagate the errors as shown here:
https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html#a-shortcut-for-propagating-errors-the--operator

Regarding your original question. Are you asking when should you use panic or Result?

2 Likes

I think my blog post on this exact topic will help untangle things: Using unwrap() in Rust is Okay - Andrew Gallant's Blog

16 Likes

The considerations when writing a program are as the internet has said. The answer to what is unrecoverable depends on what assumptions you want your program to make. "There's megs of free space on the hard drive" or "outputting won't fail"[1] might be reasonable assumptions, but "the user supplied parseable input" probably isn't. On the other hand, maybe you're on embedded and can't make assumptions about how much space is around either!

Things are unavoidably fuzzier when writing a library. In that case, you don't necessarily know what is a logic error (deserving of panic) in the depending programs. If a panicking version can provide significant benefits, it can make sense to provide both panicking and checked versions.


In addition to any other considerations, your examples are operators, and those carry additional considerations (such as programmer expectations).

Index can't return a Result with it's current design, as it has a "return a reference to a field" shape (which is dereferenced to a place at the call site), and Vec stores Ts not Result<T, _>s. It would be a very different trait if it mandated or supported fallibility:

// How it's designed now
v[0] += 1;
// If it returned `Result<&_, _>`, you'd have to dereference yourself
*v[0]? += 1;

Div could return Result without changing the trait, but then you'd have to write things like this:

x += (a/b).unwrap() - 3;
x += (a/b)? - 3;
// Panicking version:
x += a.unchecked_div(b) - 3;

Instead the panicking version is ergonomic and we have

x += a/b - 3;
// Result returning version:
x += a.checked_div(b)? - 3;
// Panic on overflow too...
x += a.checked_div(b).unwrap() - 3;

And additionally for both of these examples,, how they work in Rust is familiar to those coming from other languages.

The considerations for std (which is present in approximately all Rust programs) may be different than they are for your programs and libraries (which will typically have a narrower focus).


  1. i.e. println! is ok ↩︎

8 Likes

This is the crux of the question. It's easy to define what kinds of errors are unrecoverable, but more difficult to apply the definition to every conceivable case. One definition might be that errors are unrecoverable when there is no reasonable alternative to aborting the application or thread. What does your application do to handle division by zero? Probably not a whole lot that would be useful to an end user.

If it's safe to catch the panic unwind because it occurred in a thread that doesn't wreck global state, then by all means tear down the thread (because division by zero is unrecoverable [1]) and keep the application alive to do whatever else it needs to do. But unwinding is not supported by every platform, and it is not intended as a general-purpose error handling facility.

There are other cases that can fit under this simplified definition, such as allocation failures or a GUI's inability to connect to a display. It isn't that these are rare edge cases, per se, but that they are purely anomalous. In a perfect world, you would have infinite memory and a display would always be available the very instant the device boots. Since we have finite memory and sometimes devices run headlessly for a variety of reasons, these situations need to be addressed but not necessarily "handled" by application code.

You can, technically, but I don't think it's a good idea. Doing that would just add unwarranted ceremony where every return value must be explicitly unwrapped, or errors propagated by callers. Line noise that does not provide enough value to carry its own weight. Unwrap/propagate vs panic is one of the design tradeoffs, and the APIs mentioned tend to lean toward ergonomics.


  1. The result of this computation is meaningless, and thus "recovering" the computation to some usable state is unlikely. That doesn't mean the entire application cannot continue making progress; just that the computation involving the division is no longer useful. ↩︎

2 Likes

My general guideline starting point for libraries is roughly that if an input precondition is simple/trivial to know beforehand (either by doing a check or by knowing it out of band by way of how the input was produced), then it makes sense for the standard API point to panic in that error case. The "simple test" vibe check also carries along the quality that performing the check more than once (i.e. the caller does the check before calling the API and the API does the check to panic if it fails) is reasonably likely to optimize/fold to only a single check, and isn't too impactful if it doesn't and remains duplicated.

On the other hand, when a check is relatively expensive (e.g. it scales with input size, like UTF-8 verification) and necessarily done as part of the operation (e.g. UTF-8 verification for API soundness of str), then the default API point should return Result and rely on the caller to unwrap if desired.

The other common reason for panics in libraries is ambient assumptions. Things that are true on any machine where the library is useful, but not necessarily all machines that Rust can run on. It's reasonable to assume that it's possible to allocate some slack space on the heap to do some computation, but that isn't 100% guaranteed to be the case, so if it isn't, you panic[1].


  1. Or rather, allocation failure is actually a full process abort by the default handle_alloc_error, since allocation being possible is such a fundamental assumption. Making failed allocation unwind is possible unstably, though. ↩︎

9 Likes

The blog post is an great piece. It is as comprehensive as a guide to error handling can be at a reasonable level of abstraction.
Although the blog post covers a much broader concern than mine, are the following conclusions reasonable?
From the point of view of a library programmer, it is okay to call panic! in two situations:

  1. If some precondition, that is properly documented in our API, is violated. This can happen if the callee function passes invalid inputs to our function. In this case, which potential violations are to be documented as strict preconditions in the API, is a question that would take into account the following:
  • Ergonomics, as quinedot and parasyte have mentioned. We don't want to wrap every return value of every function of our library in a Result<T,E> instance and introduce unnecessary ceremony of using match expressions or unwrap or expect every time anyone wants to use any function from our library. We can reasonably make some assumptions about the inputs that will be passed to our functions by a programmer who has a decent understanding of our library. Therefore, we can panic! on the rare deviations from this norm, thus providing an ergonomic return value.

  • To what extent does a potential violation of a certain kind reflect a flaw in the logic of the callee function, ie to what extent is it a bug? If this kind of violation reflects a serious flaw in the callee function's logic and understanding of our library, it might warrant being included in the preconditions in our API documentation, and hence this would warrant calling panic! on a violation.

  1. If, without any fault from caller function, something goes wrong in the callee function that we wrote. In this case, we can't return a Result<T,E> instance because that would leak implementation details. Besides, it might be useless to the caller programmer without the context.

The way I think about it is, if a user is using a program and it panics for any input, then it's a bug. (To be clear, this is not a definition of a bug, because there are many bugs that don't result in panics.) If the bug comes from some library somewhere, then you have to determine whose fault it is. If there's a documented precondition that says, "you must guarantee this or else this panics," then the fault is with the caller. If there is no documented conditions for panicking, then either there is a bug in the docs or there is a bug in the callee.

Now it is possible that you call a bug "wontfix." Maybe the panic is rare enough and the work required to avoid it great enough that it's expedient to just let the bug exist. It's up to you really.

6 Likes

I have summarized my understanding of what a possible answer to my question can be after reading all the responses. Please read my reply to BurntSushi.
This is a very interesting addition. It's an apt observation that the cost of input validation needs to be considered, since the caller will most likely do a check before passing the value to our function. In this case, since we're not doing a check on our end, we can't call panic!, and returning a Result<T,E> instance is the only way.

I write a server with HTTP-based API, and most of the times, I need Result<T, E>, because user input can be invalid, and this should be reported to user in a meaningful way, rather than with a cryptic panic message.

For that reason, I use a macro to create error structs. (Maybe, it should be a general, reused error type, I can't tell yet.)

Usage:


error_struct!(AnErrorWithMessage, "some extra text");
error_struct!(SimpleError);
...
    return Err(AnErrorWithMessage(format!("can't read line {}", line_number)));
...
    return Err(SimpleError);

Caller functions return Result<T, Box<dyn std::error::Error>>.

BTW, you can actually create a type alias for this:

type Res<T> = Result<T, Box<dyn Error>>

Macro source code:

#[macro_export]
macro_rules! error_struct {
	($name:ident, $msg:literal) => {
		#[derive(Debug, Clone)]
		pub struct $name(pub String);
		impl std::error::Error for $name {}
		impl std::fmt::Display for $name {
			fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
				write!(f, "{} {}: {}", stringify!($name), $msg, &self.0)
			}
		}

	};
	($name:ident) => {
		#[derive(Debug, Clone)]
		pub struct $name;
		impl std::error::Error for $name {}
		impl std::fmt::Display for $name {
			fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
				write!(f, "{}", stringify!($name))
			}
		}
	}
}
1 Like

Result is http://wiki.c2.com/?CoupleLeapingWithLooking
panic! is http://wiki.c2.com/?LookBeforeYouLeap

Neither is strictly always better than the other, though in some cases only one of them is workable.

My TLDR here is that it's reasonable for an API to panic if needing to .unwrap() it would feel more like "grumble grumble, I obviously passed something fine, why do I need to do this" than "oh, I'm glad the compiler helped remind me about that".

That's why it's good that indexing panics -- it's a quintessential example of a place where it's very normal for there to be a local expectation that, yes, the index is in-bounds by construction, and thus passing a bad one is more often just a bug.

(Of course, having a couple-leaping-with-looking version -- like get on slices -- in addition to the panics-by-default shorter-to-type version might also be reasonable in these situations.)

7 Likes

A simple Theorical (match) :face_holding_back_tears:

  1. Example Result
    I am going to pick oranges...
    Error => I don't have a ladder
    Ok => picked up the oranges.

  2. Example Option
    You have oranges ...?
    None => I don't have oranges
    Some => I have oranges

I still remember how Java's exceptions were divided into three categories:

  • Runtime exceptions
  • Normal exceptions
  • System errors

And while it's not a perfect match, I think its maps moderately well on error handling in Rust:

Runtime exceptions

These do not require any annotation in the method signature and are fault of the programmer. Examples are calling unwrap() on a None value or accessing an invalid index in an array.

It's the programmer's fault in both cases, because if the Option can be none, the programmer should have checked before unwrapping it. Same with the array index.

Rust uses a panic for these cases.

Normal exceptions

You have to declare these in the method signature, and they represent things that can go bad and are out of your control. Most common example is IO: opening files, sending web sockets, etc.

If the user plugs a USB out of the computer in the middle of a file write, then it is not fault of the programmer.

Rust uses Result for these cases.

System errors

These were for system-wide failures such as running out of memory, stack overflow, or the process being killed from the OS. They could happen in every function, so they didn't need to declared in the method's signature.

Also, unlike runtime exceptions, these were not the programmers fault. You could, for example, check if the heap has space before every memory allocation, but doing so would be extremely cumbersome.

Rust uses a panic or outright exiting the program in these cases.

Conclusion

I general, you should return a Result (or Option) from any function that can fail. It is a better experience for the user of that function.

Internally, you can use unwrap() or other methods that can panic in your code, as long as you are confident that the panic will not actually occur.

For example, gluing 2 byte arrays that came from valid UTF-8 strings will generate a valid UTF-8 string as well, so it's OK to unwrap() (or expect()) the result. If the unwrap panics, then it's your fault as the programmer for not checking that the data was valid.

4 Likes

The decision to whether you use a Result or a panic is the exact same situation as whether you'd use an assert or throw an exception/return a NULL etc in C/C++. One kind of error should NEVER happen at runtime - asserts cover this in most languages. In Rust this would be a panic. Another kind of error can be expected and recoverable, or at least you can report it up the stack. This should be handled with Result.

As for using matches everywhere, you don't because Result has a lot of methods to handle common cases. You can also use the if let family of statements with destructuring and you can use the ? operator to bubble it up to the caller if you don't want to handle it yourself and merely report it.

I'm not happy with example 2). I think if the function answers a question that can only be yes or no it is preferable to return a 'bool'. Why? because the Some return caries no further information. Of course it's different if the question expects to know how many oranges, but then zero in the integer result can be returned and there is no need for the None. And of course if the function can actually fail then it's better to return a Result to express that.

The slightly jokey answer is "you should use panic as much as possible without ever actually calling it"

There's actually another level of error handling not yet really discussed here though: preventing the error from even being possible to make via construction, and you should generally try to achieve that where it doesn't have a large downside (generally in ergonomics or performance)

The canonical example is too not have a Foo with an initialize method or methods, then assert that it was called in the other methods; instead have a FooBuilder. There's lots of other examples that you don't really even think about as being about error handling as a user, though; Mutex doesn't have an unlock method you should only call if the current thread locked it, for example, instead you drop the MutexGuard that you access the data through (well, actually, there is an unstable Mutex::unlock method, but it's literally just a different spelling for dropping the MutexGuard)

4 Likes

This is a great comparison. I would add that in Java, Runtime Exceptions can be caught and dealt with but it is generally advised to not catch them. Instead, we should try to fix the bug that caused the exception in the first place since runtime exceptions are usually indicative of a bug. In this regard, Rust just goes a step further and instead of advising you to not catch the panic, just removes the possibility of catching a panic (although panics can also be caught but not in normal instances).

1 Like

My concern was from the point of view of a library programmer. In that case, I think it ultimately boils down to what preconditions we are enforcing in our API documentation. Whenever those preconditions are violated (for eg. when someone passes invalid inputs to our functions), we should call panic!. So, we really should be using panic! as much as possible, while making sure that someone using our library doesn't actually trigger it.

No, it's exactly the opposite. A library shouldn't panic on invalid input. That's exactly what Result is for.

3 Likes

Here's what I've been able to gather till now. Please correct me wherever I'm wrong.
From the point of view of a library programmer, calling panic! would take into consideration the following:

  1. Writing a function that returns T instead Result<T,E> - Maybe we don't want to return Result<T,E> because of ergonomics. So, when this function does receive invalid inputs, there's no options other than panic! to let the user know that there's a bug.
  2. Writing a function that returns Result<T,E> - Here, failures don't crash the program and can be communicated through the return value of the function for the caller to handle. Here, a decision to panic! should have a strong rationale behind it. Maybe there are some inputs that, assuming that the caller has a clear understanding of our API and the contracts mentioned in the documentation, are invalid in a way that it reflects a flaw in the programmer's logic. Although we can report the failure through Err, we might choose to panic! to force the user to correct this logic flaw and stop it from having negative cascade effects down the line. And hence, we reserve the Err for errors that are because of external factors (like File IO errors), that are not the fault of the caller.
1 Like