Understanding Control Flow and Error Propagation Patterns (novice)

I'm a scientist first and a programmer third or fourth, so a lot of software engineering concepts are somewhat foreign to me. I've worked through the Rust Book, and Rustlings, and I feel like I have a nebulous grasp of the language. I've even built a few crufty things that work on the embedded side for ESP32-C3. I'm running into a real road block here when thinking about building my first user-facing application though.

I've seen this advice a lot:

Please do not use unwrap() in production code

Which I understand, because panics are bad, but I'm stuck trying to think about how the control flow actually works in this case.

Let's say that I have a series of actions that I would like to perform:

  1. The user picks a folder from a browser (RFD for instance)
  2. I apply some glob pattern to that folder looking for specific files
  3. I have to unwrap the glob to get the actual filenames
  4. I use Polars to read the files into a DataFrame
  5. I do some analysis and spit out the result

Obviously, throughout all of this there are lots of things that can go wrong, and most of them should be recoverable errors. Say the user selects a folder, and the glob pattern matches nothing, I think the right thing to do would be to go back to the folder picker, and instruct them to pick a different folder, or check their file naming.

This leads me to think that a structure like this:

use rfd::FileDialog;

enum DataFlow {
  NeedFolder,
  NeedGlob(PathBuf),
  NeedFilenames(Paths),
  NeedDataFrames(Vec<PathBuf>),
}
fn main() {
  let mut state = DataFlow::NeedFolder;
  loop {
    match state {
      DataFlow::NeedFolder => state = DataFlow::NeedGlob(get_folder()),
      DataFlow::NeedGlob(folder_dir) => state = DataFlow::NeedFilename(glob(&some_pattern)),
      _ => todo!() // Omitted for brevity
    }
  }
}
fn get_folder() -> PathBuf {
  let folder_path = rfd::FileDialog::new()
    .set_directory("~")
    .pick_folder();
  match folder_path {
    Some(path) => path,
    None => {
      println!("Invalid folder or No folder selected");
      get_folder()
    }
}

Is this a standard design pattern? Or is there another way to accomplish this? I'm basically looking for a way to establish retry paths for errors so that I can handle my errors gracefully. I've done a decent amount of searching on this topic, but I've come up a bit empty-handed. Any and all help/discussion is appreciated.

Like a lot of other software engineering advice, this is pithy but whether it actually makes sense depends on the actual requirements. The first question you have to ask is: if the hypothetical unwrap were to take effect (panic), what does that situation mean?

  • Does it mean that there is a bug in the program — is it a case which “can't happen”?
  • Does it mean that the user provided erroneous input (if there is a user)?
  • Does it mean that something outside of your control and the user’s control went wrong?
  • Is it possible to recover from this problem?
    • If so, which party has enough information to do so?
    • Does the recovery mean that the program will have to produce inferior or incorrect results?

The simplest case to consider is “user provided erroneous input”. In that case, panics are the wrong tool; they do not provide facilities for communicating useful error reporting to the user, and are inconvenient (or sometimes impossible) to catch and recover from.

(However, in some cases, an application might choose to panic as a tool for aborting a computation without needing to explicitly make it fallible. For example, rust-analyzer does this to cancel computations that have been made obsolete when the input changes. In these cases, the panic is not expected to convey any information, and it is always caught, via catch_unwind() or thread JoinHandles.)

Now, consider bugs aka “can't happen”. Here are some policies you might choose for bugs:

  • If a bug is detected, panic. In this case, unwrap() is fine as a means to cause that panic (those some coding styles may prefer to always use expect() which includes a reason message.
  • Don’t ever panic, no matter what. This is very difficult to enforce because many things that are not unwrap() can panic. This is usually reserved for very specialized, usually embedded, software with high fault-tolerance requirements. You will have to do many other things besides not calling unwrap().
  • Return a Result::Err instead of panicking. This is rarely recommended because the presence of bugs is the ultimate “implementation detail” and one generally prefers not to put implementation details in one's public API, as much as possible. However, it may be desirable in special cases; for example, if there is an algorithm which is difficult to be assured of the correctness of but which is self-contained (running on specific input data, and the input and output can be completely discarded in case of a problem), which must be run without support for catching panics (e.g. on wasm32-unknown-unknown), then returning an “internal error” value may be the least bad option.

In my opinion, the first option — do panic on bugs — is what should be the default choice for most Rust applications and libraries. (But people might have different perspectives on what “most” is; what a typical case is.)

The third case worth discussing is errors that can be caused by the environment — out of memory, permission errors, missing resources, etc. In this case, there aren't usually one size fits all policies that you can apply to the entire codebase — individual cases may be more or less likely, more or less recoverable from, and more or less meaningful to the user. Good software should process each error as suits it.

One subtle case here is when a library function is called with incorrect arguments. In this case, if it is trivial for the caller to be always correct (for example, “index must always be less than len”), then this is a “bug” case. If the precondition is hard to check, or might be broken by an outside change not under the control of the caller (for example, a file being deleted), then the function should return an error so that the caller can handle the problem as they see fit.


Regarding your specific example code, note that it is not appropriate to ask the user a question and retry in an infinite loop if the answer is invalid. In particular, the user should always be allowed to cancel the whole operation, because they realized they can't proceed now, or they want to do something else first, or do something else entirely.

(Also, don't use recursion to retry. That will potentially cause a stack overflow. Use a loop {}.)

6 Likes

The way I tend to think about it is that you should panic when the only way to get this fixed is to find a Rust programmer and get them to fix it for you; you should return Result::Err where there's a possibility that a caller can fix things by doing something else (noting that expect() means that if the only way the caller can fix things is by finding a Rust programmer, they can make it a panic).

For this concrete problem, I'd split into two:

  1. Functions that return Result::Ok if they succeed, and Result::Err if they fail.
  2. Some form of "orchestration" function that handles Result::Err variants.

So, sketching it out (this is not complete code):

#[derive(thiserror::Error)]
enum FolderError {
    #[error("The user did not select a folder")]
    NoFolderSelected,
    …
}
#[derive(thiserror::Error)]
enum GlobError {
    #[error("The folder has no files in it")]
    FolderHasNoFiles(PathBuf),
    …
}

fn get_folder() -> Result<PathBuf, FolderError> {
    rfd::FileDialog::new()
        .set_directory("~")
        .pick_folder().ok_or(FolderError::NoFolderSelected)
}
fn glob_folder(path: &Path) -> Result<Vec<PathBuf>, GlobError> { … }

fn main() {
    loop {
        let folder = match get_folder() {
            Ok(path_buf) => path_buf,
            Err(FolderError::NoFolderSelected) => {
                eprintln!("No folder selected - trying again");
                continue;
        };
        let files = match glob_folder(&folder) {
             Ok(files) => files,
             Err(GlobError::FolderHasNoFiles(folder) => {
                eprintln!("{folder} contains no files - asking for selection again");
                continue;
             }
        };
    }
}

The idea is that each function can be written to pass the error up a level, using ? to accelerate that process. When you reach a point that can reasonably handle the error, it does so.

2 Likes

Both of these answers are really excellent, and very helpful.

I'm currently trying to port some python data analysis code to Rust and wrap it with a GUI, so I'd like to avoid any instances where a panic would crash the GUI. Ideally, I just want to step the state of the control flow back a step or two depending on the error.

Is my use of an enum to model state an unusual construct? Or is this common in Rust? I think I've also seen structs used to model state and pass data around within loops.

This is a good reason to consider using catch_unwind() (or running the computation in a separate thread, which has the same effect and is helpful for hangs too). That way, if there are any unanticipated panic cases, and there almost certainly will be, your application can just report the problem and let the user try something else.

The important factors in this choice are:

  • You control the application. (If you were writing a library, you should try not to rely on catch_unwind() if not necessary, so that your libraries .)
  • You're not choosing to use panicking to report errors the user will see, but rather, acknowledging the near-inevitability of bugs, and recovering from them.

Ideally, your “stepping back” should be implemented in a way such that nothing the panicked code might have been mutating is kept. This is simplest if it's a “pure function”, i.e. doesn't mutate anything passed to it, but only returns newly created data. Then, the recovery process is just reporting the error to the user, not attempting to clean up after a half-completed computation. (Rust will take care of freeing memory that was in use.)

There's nothing wrong with using an enum to create a state machine, but the odd thing about your enum DataFlow sample code is that the loop does nothing else and there are no non-straight-line state transitions. That means that your loop could be replaced with plain sequential code.

But maybe it just looks unnecessary because you've omitted some things. If there was something that's done in every loop iteration — for example, if the loop was actually a GUI event loop that let other things happen — then that would make sense as a reason to keep it.

You might also choose to use a struct and an enum, or even more than two. For example, right now each of your steps is fetching some information and discarding the prior information. But if your processing requires multiple inputs — like both input files and what to do with them — then it makes sense to let the user fill them in in any order, and that would be better modeled as a struct with several Option fields. That struct might then be owned by the enum that makes distinctions like "needs user input" vs "currently processing".

1 Like