Do you agree with my opinions on error handling?

I wrote a post on error handling (that ended up m more like blog length). I'm posting it here to see if other people agree with me or not. I encourage comments, especially if they disagree with what is here!


Ramblings

Rust has some great abstractions for handling failure situations, where some I/O or something else hasn't worked out. You have 3 choices when something hasn't worked.

  1. Ignore the failure and continue anyway
  2. (If you're in a function) return a Result<T, E> instead of T
    1. the type E implements std::error::Error
    2. the type E has no constraints
  3. panic!

I would try to avoid (1) at all costs.

It's not true in my opinion that panicking is strictly worse than Result, sometimes it is more appropriate (where there is mis-configuration or programmer error), as long as what the crate user needs to do to avoid it are well documented.

If you're using a Result, then you need to decide what the error type will be. I would argue that std::error::Error is really only for errors that will be reported to a person as text. Its main functionality is that it can be printed (using Display) and can contain another error (using source). If you're not using this functionality, you're essentially implementing Error just as a marker, which might not be worth doing. Any error you're going to recover from internally is probably not worth implementing Error for.

This brings me to my next point - sometimes it's OK for errors to be strings of text: specifically when the error is unrecoverable and only going to be consumed by a human. Say you have a function that can error, and there is only 1 way you can error. You'll probably do something like the following

#[derive(Debug)]
struct MyError;

impl Display for MyError {
    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
        f.write_str("this is the error message")
    }
}

That's a lot of lines to just give a string error, and since there is only one way it can fail, a machine can just ignore the value of the error.

Of course there are gray areas, mostly when you have an error that you may recover from. For example, if a texture isn't available, you might want to continue and display an error message in-game, because that helps modders more than crashing out. In that case you would use an enum error, and implement Display + Error as a courtesy, so the developer can decide what to do.

Another issue is how to deal with io::Error, and it demonstrates the value, and limitations, of error chains. It may be that there are some kinds I expect more - maybe that a file is missing, or that I don't have permission to write a file I need to. You can test for these kinds and convert them to your own error kind, but if you encounter an error (say InvalidData for example) that you didn't expect, you want to give the user as much information as possible. On the other hand, all the information you got from the OS was invalid data that's not going to be much more use than an unexpected error occurred.

Summary

  1. Avoid failing silently, a.k.a. allowing code paths that you haven't really thought about, and where you don't know what's going to happen.
  2. Use panic! to indicate to a developer using your crate that he has not upheld some invariant you insist upon.
  3. Use Result<T, E> for things that might fail otherwise.
  4. Always implement std::error::Error when an error might be used in an error chain.
  5. Maybe implement std::error::Error at other times (it's not 100% essential, and you're basically just doing it for the Display impl and the marker).
  6. Keep the happy path fast and lean.
  7. Box your error if it contains a lot of data. Result is as wide as its widest variant, and you're only going to pay the cost of indirection once something has already gone wrong (i.e. performance probably isn't so important any more). I think it's perfectly acceptable to return Result<T, Box<MyError>> when you want to keep the size down. As always with performance, benchmark. :slight_smile:
  8. Trait objects make everything more complicated. You start having issues with Send/Sync/'static. Most of the time you don't need them, you can just use cause: ConcreteInnerErrorType. The only time you use dyn Error (+ 'static + Send + Sync + ...) is when you're implementing source.
  9. Sometimes it makes sense to wrap the underlying error (e.g. IO error), sometimes it makes sense to subsume it (e.g. unit struct error). Use your judgment.
  10. Don't be afraid of just implementing the trait in the way that makes sense for your error type. You don't need to use a procedural macro and shouldn't unless it's literally just to save some boilerplate.

Example

This example uses a few techniques I think are interesting and/or useful. One notable technique is handling an enum that may have extra variants added - I do this by converting anything I don't recognize to a Box<dyn Error>.

I also stringify the dyn Error when I need to clone it, as I can't clone it directly (I don't know it's Clone).

use std::{
    error::Error,
    fmt,
    fs::File,
    io,
    path::{Path, PathBuf},
};

pub fn load_image(image_path: &str) -> Result<Image, MyError> {
    let mut img_src =
        File::open(image_path).map_err(|e| MyError::error_opening_file(image_path, e))?;
    let img = process_image(&mut img_src).map_err(|e| MyError::error_in_lib(image_path, e))?;
    Ok(img)
}

// Error types

/// The error type for this function. Could be re-used if there are other functions with exactly
/// the same error possibilities.
#[derive(Debug, Clone)]
pub struct MyError {
    // These could be public, but using accessor methods leaves open the possibilities of
    // reorganising things in the future.
    /// The path to the image we tried to load
    image_path: PathBuf,
    /// The type of error we encountered
    kind: MyErrorKind,
}

#[derive(Debug)]
pub enum MyErrorKind {
    /// Error opening file stream.
    OpeningFile(io::Error),
    // here we've chosen to consume the unerlying error. We could decide to wrap it, but I think
    // that would be more complicated.
    /// Error reading data from the file stream.
    ReadingData(io::Error),
    /// The magic bytes were wrong
    Magic([u8; 4]),
    /// Data in the stream was invalid
    Invalid(usize, Option<String>),
    /// Some error we didn't expect.
    // By capturing the error this way, we can still use `Display` to print out anything in it.
    // It's the only time you want to be storing it as a trait object.
    UnknownLib(Box<dyn Error + Send + Sync>),
}

// Impls

impl MyError {
    /// What type of error occurred.
    pub fn kind(&self) -> &MyErrorKind {
        &self.kind
    }

    /// The image file we were trying to work with when this error occurred.
    pub fn image_path(&self) -> &Path {
        self.image_path.as_ref()
    }

    /// Wrap an io::Error the way we want. Helper method
    fn error_opening_file(image_path: impl Into<PathBuf>, e: io::Error) -> Self {
        let kind = MyErrorKind::OpeningFile(e); // io::Error isn't too wide, so no indrection.
        MyError::from_parts(image_path, kind)
    }

    /// Process a library error. Helper method
    fn error_in_lib(image_path: impl Into<PathBuf>, e: ImageError) -> Self {
        let kind = match e {
            ImageError::Io(e) => MyErrorKind::ReadingData(e),
            ImageError::Magic(bytes) => MyErrorKind::Magic(bytes),
            ImageError::Invalid(pos, msg) => MyErrorKind::Invalid(pos, msg),
            e => MyErrorKind::UnknownLib(Box::new(e) as Box<dyn Error + Send + Sync>),
        };
        MyError::from_parts(image_path, kind)
    }

    /// Boilerplate for creating the struct. Helper method
    fn from_parts(image_path: impl Into<PathBuf>, kind: MyErrorKind) -> Self {
        MyError {
            image_path: image_path.into(),
            kind,
        }
    }
}

impl fmt::Display for MyError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let image_path = self.image_path.display();
        match self.kind() {
            MyErrorKind::OpeningFile(e) => {
                let cannot_open = format!("cannot open file at \"{}\" as an image because", image_path);
                match e.kind() {
                    io::ErrorKind::NotFound => write!(f, "{} it could not be found", cannot_open),
                    io::ErrorKind::PermissionDenied => write!(f, "{} this process doesn't have permission to read it", cannot_open),
                    _ => write!(f, "{} of an unexpected i/o error: {}", cannot_open, e),
                }
            }
            // If we've already opened the file - something wierd is happening if we can't read it,
            // so just print the i/o error:
            MyErrorKind::ReadingData(e) => write!(f, "cannot read from file at \"{}\" because of an unexpected i/o error: {}", image_path, e),
            MyErrorKind::Magic(bytes) => write!(f, "while reading image at \"{}\", expected the magic bytes [1, 2, 3, 4] but found {:?}", image_path, bytes),
            MyErrorKind::Invalid(pos, msg) => {
                let invalid_offset = format!("image at \"{}\" has an invalid byte at offset {}", image_path, pos);
                match msg {
                    Some(msg) => write!(f, "{}: {}", invalid_offset, msg),
                    None => write!(f, "{}", invalid_offset),
                }
            }
            MyErrorKind::UnknownLib(e) => write!(
                f,
                "while decoding image at \"{}\", received the error \"{}\"",
                self.image_path.display(), e
            ),
        }
    }
}

// We can't derive this because io::Error doesn't implement clone. We're just going to discard any
// inner error here.
impl Clone for MyErrorKind {
    fn clone(&self) -> MyErrorKind {
        match self {
            // Discard what we have to to make this cloneable.
            MyErrorKind::OpeningFile(e) => MyErrorKind::OpeningFile(e.kind().into()),
            MyErrorKind::ReadingData(e) => MyErrorKind::ReadingData(e.kind().into()),
            // Next things we can clone
            MyErrorKind::Magic(bytes) => MyErrorKind::Magic(*bytes),
            MyErrorKind::Invalid(pos, msg) => MyErrorKind::Invalid(*pos, msg.clone()),
            // The error might not be clone, so just convert it to text.
            MyErrorKind::UnknownLib(e) => MyErrorKind::UnknownLib(Box::new(StringError::from(
                e.to_string(),
            ))
                as Box<dyn Error + Send + Sync>),
        }
    }
}

// I really think Error should be impl for String
#[derive(Debug, Clone, Eq, PartialEq, Ord, PartialOrd, Hash, Default)]
#[repr(transparent)]
pub struct StringError(String);

impl fmt::Display for StringError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        fmt::Display::fmt(&self.0, f)
    }
}

impl Error for StringError {}

impl From<String> for StringError {
    fn from(s: String) -> StringError {
        StringError(s)
    }
}

impl From<StringError> for String {
    fn from(e: StringError) -> String {
        e.0
    }
}

// External

// Imagine ImageError looks like this. It's good to think how you would wrap all kinds of errors,
// since different crate authors will do different things.

// Because ImageError is Send + Sync, changing this would be backwards-incompatible.
// That's why we can use `dyn Error + Send + Sync` above. This isn't always true.
#[derive(Debug)]
pub enum ImageError {
    /// An i/o error occurred.
    Io(io::Error),
    /// The magic 4 bytes were wrong.
    Magic([u8; 4]),
    /// A byte in the file was invalid for its position. Some more information is given as text.
    Invalid(usize, Option<String>),
    #[doc(hidden)]
    __NonExhaustive,
}

impl fmt::Display for ImageError {
    fn fmt(&self, _f: &mut fmt::Formatter) -> fmt::Result {
        unimplemented!()
    }
}

impl Error for ImageError {}

pub fn process_image(_r: &mut impl io::Read) -> Result<Image, ImageError> {
    unimplemented!()
}

pub type Image = ();

panic! is only suitable for unrecoverable errors. It is configured to abort immediately on many embedded systems, especially where unwinding is not supported. std uses panic! in places which may be recoverable, but only in an application-specific way, e.g. when a Mutex is poisoned.

Otherwise, avoid panic!. It’s hard to properly document how a function call may panic, especially when it occurs several crates deep.

I agree with @derekdreery's perspective on panics. IMO, saying that "panics are only suitable for unrecoverable errors" is not particularly clear nor actionable. What does it mean for something to be recoverable? It raises more questions than it answers.

Instead, it is much simpler to state that if an end user observes a panic, then there exists a bug somewhere in the code. It could be because of a bug in a crate dependency somewhere, or it could be because a crate documents the conditions on which something panics and your use of that crate did not uphold those conditions. In either case, the next step is very clear: fix the bug.

The cases in which a library intentionally panics are more of a judgment call, and there is no one right answer. Libraries should certainly not be using panics liberally as an error handling mechanism, but libraries are perfectly within their rights to exposes APIs that intentionally panic on some inputs. For example, the regex crate exposes index APIs on Captures that panic if an invalid or non-matching capture group name is given. Of course, there are also non-panicking equivalents. A crate that uses the index APIs incorrectly by assuming the existence of a matching capturing group that isn't guaranteed to exist may have a bug, depending on whether the regex is run on inputs that expose the faulty assumption. In many cases, however, a capturing group exists if and only if the overall regex matched, and so, the panicking index APIs are quite convenient.

Another case in which libraries may panic is via asserts. It is fairly common to add asserts to code that verify runtime invariants. In correct code, those asserts should never fail, but if they do fail, they become useful signposts for discovering, diagnosing and fixing bugs. To say that panics should be avoided would mean library authors should stop using asserts, which I think is a bad idea. You might say that asserts fall under "unrecoverable errors," but the ambiguity of what it means to "recover" blurs the issue. Why not return an error instead, for example? An assert triggering doesn't necessarily mean everything has gone to hell. :slight_smile:

The above example in the regex crate is something that can occur in any number of situations and should be encouraged, similarly for asserts. The TLDR IMO is that it is much easier to focus on the end result. Can a user of your code witness a panic? If so, that's bad. Fix it.

4 Likes

What you have wrote is what I was trying to get at with panics: that they should indicate a bug. In C++ land, you'd probably just have that bad inputs to your functions cause UB, so definitely failing is a big improvement!

I've bought into the idea that panics should be documented at source (panic!, unwrap, etc.) either by using expect or using a doc comment. I think it's good if a user gets something like "panicked at assert in function_name - this is a bug, please report it at ....", makes it easier to fix bugs. Correct me if I'm wrong, but I think you would still get an expect message if you use panic=abort.

I was wondering today if we might eventually get some sort of path analysis tool that tracks all the paths that lead to panics, so that we can audit them and convince ourselves that each one is either unreachable, or the invariants it expects are clearly documented. I'm not sure if something like this is even possible though.

I agree that an error is not necessary an std::error::Error. My answer is Concrete Error eXchange.

This sounds like you’re agreeing that panics should not be seen under normal circumstances. In other words, unusual circumstances are treated as unrecoverable. No?

Take for instance when panic is configured to abort instead of unwind. In this case, panics are guaranteed unrecoverable. Which is fine because all the bugs that panic should be fixed, right?

Are you arguing semantics, or is there some other reason to disagree?