Error/panic handling in rust libraries that also provide Python bindings

Hello,

I'm working on a numerical library that should be usable both as a pure Rust library and as an idiomatic Python library. The Python library will be a separate crate that will utilize pyo3 to provide Python bindings. It seems to me that polars has quite some success with this approach.

Now the situation with regards to handling errors in such a mixed library is a bit complicated:

This does lead to real-world issues.

I'm wondering how to best avoid these kind of problems. Obviously, the crate that provides the Python bindings has to make sure that it does not leak any panics into Python. I can think of three alternative approaches:

  1. Add checks to the Python wrapper crate that raise Python exceptions whenever the Rust code would panic. Advantage: Does not complicate Rust API. Disadvantage: Correctness logic is spread between wrapper crate and the Rust library.

  2. Catch panics with catch_unwind. Advantage: Easy to do, does not complicate Rust API. Disadvantage: Seems a bit unclean, could perhaps be problematic in some way?

  3. For each potentially panicking operation in the Rust library, add a Result-returning alternative. Advantage: Seems the most "proper". Disadvantage: Possibly code and API bloat for the Rust library.

I would appreciate some advice here. Myself, I tend towards approach 3, but perhaps I'm too pedantic, and 2 would work just as well?

Thanks!

1 Like

I would go with 3 and not expose panicking API in the python wrapper

But why? Does using catch_unwind (approach 2) incur any runtime cost or other problems?

Also, let's say we chose approach 3 and there is some operation where the shapes (=sequences of integers) of two parameters must match. This is easy to check, but not totally free neither. So I would say that for the pure Rust library this is a candidate for debug_assert: in release mode there will be a panic as well (since ultimately non-matching shapes will lead to some indexing panic), but the error message will be less clear.

Now if I want to provide a checked version of this functionality as well, I will have to implement the check twice, or factor it out into a function. Or is it a good idea to use the checked version to implement the panicking version? Will the compiler optimize the check away?

A fourth option is:

  1. Set panic = "abort" in the Python extension crate.

Realistically, I think you want a combination of approches. Generally, Python code should never, or almost never, cause panics. To handle goofy Python, I would probably choose option 1 with a dash of option 3 where helpful.

But it's also possible for the Rust code (both in the Python wrapper crate and the underlying Rust library) to panic due to a plain old bug. In these cases, you must avoid undefined behavior; Python code definitely shouldn't be able to trigger UB. Use option 2 or option 4 for that.

You can just use option 2 for everything, but

  • panic unwinding is slow
  • it can be a pain to debug
  • if you have mutexes, panic can poison them, and generally panic can leave data structures in unexpected states; maybe you shouldn't volunteer to maintain more weird states than you have to
  • option 1 or 3 will produce better error messages for Python users

Let's say that there's a bug in the Rust library or in the wrapper crate that triggers a panic. With pyo3's default settings this will be forwarded to Python as a PanicException. (This exception inherits from Python's BaseException and normally should lead to the process aborting, but this is not enforced.)

Are you saying that this can lead to undefined behavior? I assumed that the way pyo3 implements this is that it wrapps all Rust code called from Python into an equivalent of catch_unwind and then translates any panics into said python exception. That would avoid unwinding across FFI which is UB.

A Python programmer will expect mistakes in their code to manifest as Python exceptions, not as interpreter crashes. That has material consequences to the design of Python programs - for example,

try:
  some_fallible_operation()
except:
  print("_Something_ went wrong!")

is expected to print the error message from the except block pretty much no matter what the nature of the failure is, whereas in Rust, an expression like some_fallible_operation().ok_or_else(||println!("Ooops")) is well understood to potentially do neither of those things in the event of a panic. Interpreter crashes - even those driven by crashes in extension modules - are generally considered by the Python community to be bugs.

Given that, in your position I would want to expose those errors to Python programmers as exceptions, one way or another, even if they manifest as panics on the Rust side of the call.

Given that any result-yielding interface can be turned into a panicking interface trivially with .expect(…) or .unwrap(), my reflex here would be to write operations that can fail as operations that return results, while taking care to avoid triggering panics in functions called from those operations. I'm not sure I would bother with catch_unwind until I get reports from users that actual programs are encountering panics "in the wild" or until I can reproduce one in my own testing.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.