As a library author, should my library panics if a compile time check is deferred to runtime?

I am writing sosecrets-rs.

Briefly, there is a type named Secret<T, MEC, EC> where MEC and EC are type-level unsigned integers (from typenum::consts::*; e.g. U0 for unsigned 0 and U1 for unsigned 1, etc.). At compile time, the crate will check if EC is lesser than MEC when the caller calls the method expose_secret(...) of Secret<T, MEC, EC>, and if it is not lesser than MEC, the program would not compile.

This is a compile time check. I plan to feature gate the runtime checks.

There will be two versions of RuntimeSecret<T>, both MEC and EC will become fields, one version will be the synchronous version, another asynchronous (this is cool because it is like a Semaphore but fails when it ran out of Permits).

I'm curious if I should make the program panics if MEC > EC at runtime or should I instead return an Option::<T>::None?

I am asking because although it is always advised that libraries should NEVER panic, compile time checks that are deferred to runtime always adopt panics for its failure case, e.g. RefCell, etc. Hence, the question.

BurntSushi's Using unwrap() in Rust is Okay is a great blog post about panics in a Rust program. It's not specifically targeted at the question whether a library should panic or not, but I think the broader perspective is still relevant and might help you find a solution you think is best.

I'd consider a Result<T> better than returning Option<T> if an operation has failed due to a runtime check not being successful. Also, many types offer both a panicking version and a Result returning version of the same operation (often times prefixed with a try_), so that users can choose whether they want to do error handling or not.


Another good post you might want to read:

(you might be interested in the whole topic, but I wanted to highlight the post because I think the "look before you leap" vs "couple leaping with looking" is a very nice way of framing the choice between panicking and returning something that indicates an error)

11 Likes

I'd also recommend BurntSushi's "Using unwrap() in Rust is Okay" as good background reading.

The way I tend to think about this is in terms of how the failure can be handled; panics imply that the only way to handle this failure is to find a Rust programmer, give them the code and the problem, and ask them to fix it, while Result implies that it's reasonable for the program to do something (possibly involving the end user) to handle the failure.

In this framing, File::open should definitely return Result; if the file isn't present; there are reasonable choices if you can't open the file. Whereas most RefCell methods panic because if you need RefCell::borrow (for example) to work, and it doesn't, the only real way to fix it is to fix the bug in your code.

If you're not sure which case is correct for your library, it's always possible to offer both - RefCell::borrow and RefCell::try_borrow as examples, where you can implement the panicking version as a simple .unwrap() around the non-panicking version. This then allows you to see if there are good use cases for the try_ variants (which might well be rarely used), or if panicking is the only good answer in your case.

4 Likes

Here's a perspective from Android. We compile all Rust code in the Android OS with panic=abort. This means that if the user of the program can trigger a panic, then they have a potential DOS attack vector where they can crash parts of the OS.

There have been some popular parsing libraries that we simply cannot use in Android because they sometimes handle invalid input by panicking. When this takes down the entire process, that's not acceptable for us.

So to me, you should only panic if the program has a bug.

That said, the bug can be in the user of your library. It's okay to say "don't call my method like this" and panic if they use it incorrectly. But you should only do this if the panic reveals a bug in the user of your library.

17 Likes

Thank you for your prompt response.

When you say the program has a bug, do you mean the user executable has a bug or that my library has a bug?

Thank you.

It's okay to use panics for either type of bug. There just has to be a bug somewhere.

2 Likes

Hi @alice, @jofas and @farnz ,

Sorry to belabor on this topic. Just a follow-up question, what if my crate aspires to work in a #![no_std] context? I.e., I would lose the capability to return an error, then should my API purely panics if there is a bug?

Thank you.

Why would you lose the ability to return an error? Both Result and Error are part of core.

1 Like

You can definitely still use Result in #![no_std] context. I do that in the Linux kernel.

1 Like

Thank you @alice and @jofas for your responses.

Isn't the core::error module a nightly feature? I would want sosecrets-rs crate to be on stable and #![no_std] context. I know Result is on core::result::Result which is on stable. But if the Error trait is not on stable, wouldn't that limit interoperability with binaries in std context that use my library? Since because of the Orphan Rule, they would get a value of my error struct without the Error trait implemented on it and they cannot implement the trait either.

Please teach me how to do this?

Is it that it is fine to not implement the Error trait on my error types? I had a look at core::cell::RefCell and it seems like BorrowMutError doesn't implement the Error trait either.

Yes, that is not a problem. As others have said, using Result in no_std code is very common.

Users of your library in a std context don't have to implement Error for your error type. However, if you want to implement Error for your error types, just in case someone wants it, and you don't want to wait until error_in_core is stabilized, then you can implement it now under a cfg for std:

#[cfg(feature = "std")]
impl std::error::Error for MyErr {}

However, later when error_in_core is stabilized, you'll probably want to remove that cfg and change it to implement core::error::Error (assuming this is how it will work after stabilization):

impl core::error::Error for MyErr {}
3 Likes

If it's the user's mistake, why not just return an error instead of panicking?

I'd think the only case for panicking is for bugs which are due to code you thought wasn't possible to hit due to some invariant you couldn't encode in the type system.

1 Like

True, but the invariant here is the precondition for calling the function, which is under the caller's control of course. By your argument, many existing std library functions that have such preconditions should not panic and should return an error instead. There could be such a std library, but there would be many, many more calls to unwrap.

1 Like

What functions are there like that in std? Genuinely curious.

Isn't that a broad category in general though? I think that could be applied to nearly anything. E.g. parsing bytes as a String. You could tell the user "make sure the bytes you pass are valid UTF-8, or else this function will panic". std doesn't do that.

1 Like

I'm doing a search in the slice doc for panics and listing them:

swap -- index out of bounds
windows, array_windows -- zero window size param
chunks (many of these) -- zero chunk size param
split_at* -- if mid > len
select_nth_unstable* -- if index >= len()
rotate_* -- if mid > len()
clone_from_slice -- if the two slices have different lengths

That's about half of them for slice I think.

And there are also all the indexing ops (sl[x]) which panic but have an get alternative that doesn't panic, which is a different pattern.

Yes, good point. There are some invariants preconditions that you could say "don't make sense" and cause panic, and others that are expected to be used for validation of external input like utf8 validity. It can be pretty subjective.

I tried programming in Pony, which is the only language I've used that never panics. You use a lot of try statements, for array indexing for example. But I don't think it is a wrong or bad approach. It is very different than other languages I've used. Have you used languages that don't panic, or throw runtime exceptions which is about the same thing?

Also, in Pony you can't panic (without using FFI). So what do you do when you have to handle an index out of bounds that you believe should never happen? It's pretty difficult! Apparently a lot of Pony programmers just define a panic function using FFI.

3 Likes

Hmm, I wonder if there is any official guidance for std about when to panic or not. I wouldn't think at least some of these could have been errors, and I know many other functions do return errors instead.

The post from @jofas above has some good guidelines and links to resources on that. But I don't know of any "official" guidelines.

I'm on libs-api. There is no official guidance written down anywhere. Only judgment. You need to look at each API holistically and how it is commonly used. A UTF-8 invariant usually needs to be checked on user given data, and so panicking by default wouldn't be appropriate. But, for example, split_at requires an index, and if that index is invalid, that usually indicates a bug somewhere in the program rather than invalid user input. But neither of these things are always true. But you need to look at the typical cases.

With all that said, I do think there is probably a general and reasonable desire for non-panicking variants of functions like split_at, and providing them (in addition to panicking versions) is probably not a bad idea. But panicking conditions aren't always so simple.

4 Likes

The vibe I try to stick to is asking questions like

  • How difficult is it to check the precondition before calling this function?
  • How common is it to incidentally have checked the precondition before calling this function?
  • How likely is it to call the function without already expecting the precondition to be true?
  • How much of the work is the precondition check compared to the other functionality?
  • How independent is the precondition check from the other functionality?
  • Is there useful information to provide in the error case? Do I naturally have it from the check or is it extra work to collect?
  • Is precondition failure generally a code/flow issue or a data issue?
  • Do either the check or main functionality require taking resource ownership from the caller that is difficult (or even just awkward) to refund?
  • If a user doesn't like my checks, would that incentivise them to just avoid the abstraction entirely?

These are vague, continuous fields, and a good number of them. I couldn't nail down what the equation parameters are, but I do know these are the concepts that guide the decision. The very simplified version is panicking for O(1) precondition failure and returning an error for O(n) precondition failure. That alone does a surprisingly good job combined with the generally obvious case for constructors doing basically no work other than the precondition checks.

For some of the examples:

RefCell::borrow panics when acquired reentrantly. Reentrancy is fairly uncommon and you usually know when you're open to doing it. So panic on error.

slice::windows/chunks: the size is quite often a source constant, and "zeroness" is a common property to know about a number. So panic on error.

NonZero::new: the only thing done is the check. So return the error.

Since your thing is security adjacent, that suggests to me that you should be returning Result (or Option if there's no context to benefit a caller using ?), because the proposed value add is in the tracking. On the other hand, if a consumer finds that annoying, the alternative of just using T instead of Secret<T>, skipping the checks entirely, and maybe that's enough of a problem that crashing is preferable, meaning you'd want the panic.

Always be asking and answering questions in the context of why someone would be using the library. Build the pit of success that benefits your users the most.

10 Likes

Hi @alice, @mr_rustbot, @jofas, @BurntSushi and @CAD97,

Thank you all for your very illuminating discussions and the examples you guys had given me to guide my API design.

I have decided to have two versions of expose_secret, this which is the panicking version and try_expose_secret, which according to the customary naming convention, is the non-panicking version that returns a Result<ReturnType, ExposeSecretError>. I have also created a new API for Secret<T, const MEC: usize>, named exposure_count which returns the current exposure count of the secret.

Why Two Versions?

  1. I recognize that there may be situations where users of my library wouldn't be able to know exactly when they would need to expose the secrets for use (else if they can know this during compile time, they would use the Secret<T, MEC: typenum::Unsigned, EC: typenum::Unsigned = typenum::consts::U0>), hence the runtime version of Secret that counts the secret's exposed count, making sure it is always less than what it is maximally allowed for.
  2. There may be situations where users might want to set an upper limit on the maximum exposure count, e.g. 10, but depending on runtime conditions, may end up only using 5, so they might want to use the panicking version of expose_secret() to guard against totally unexpected usage of the secret.
  3. Else, for example, if they are running a web server, then they might prefer to return a Result instead which they can handle at each invocation.

Here's how the APIs look like now, do if your comments if you wish.

use core::{cell::UnsafeCell, marker::PhantomData, ops::Deref};

use crate::runtime::{error, traits};
pub struct RTSecret<T, const MEC: usize>(T, UnsafeCell<usize>);
pub struct RTExposedSecret<'brand, T>(T, PhantomData<fn(&'brand ()) -> &'brand ()>);

impl<'brand, T> Deref for RTExposedSecret<'brand, &'brand T> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        self.0
    }
}

impl<T, const MEC: usize> RTSecret<T, MEC> {
    pub const fn new(value: T) -> Self {
        Self(value, UnsafeCell::new(0))
    }

    pub fn new_with(f: impl FnOnce() -> T) -> Self {
        Self(f(), UnsafeCell::new(0))
    }

    pub fn exposure_count(&self) -> usize {
        unsafe { *self.1.get() }
    }
}

impl<'secret, T, const MEC: usize> traits::RTExposeSecret<'secret, &'secret T>
    for RTSecret<T, MEC>
{
    type Exposed<'brand> = RTExposedSecret<'brand, &'brand T>
    where
        'secret: 'brand;

    fn expose_secret<ReturnType, ClosureType>(&self, scope: ClosureType) -> ReturnType
    where
        for<'brand> ClosureType: FnOnce(RTExposedSecret<'brand, &'brand T>) -> ReturnType,
    {
        match self.try_expose_secret(scope) {
            Ok(returned_value) => returned_value,
            Err(err) => match err {
                error::ExposeSecretError::ExposeMoreThanMaximallyAllow(err) => {
                    panic!("`RTSecret` has already been exposed {} times, which is also the maximum number it is allowed to be exposed for.", err.mec)
                }
            },
        }
    }

    fn try_expose_secret<ReturnType, ClosureType>(
        &self,
        scope: ClosureType,
    ) -> Result<ReturnType, error::ExposeSecretError>
    where
        for<'brand> ClosureType: FnOnce(RTExposedSecret<'brand, &'brand T>) -> ReturnType,
    {
        // SAFETY: All tuple fields of `RTSecret` are private, there are no getter / setter to them.
        // `RTSecret` is also not `Sync` so it is not possible to have multithreading race condition.
        let ec_mut = unsafe { &mut *self.1.get() };
        if *ec_mut >= MEC {
            return Err(error::ExposeSecretError::ExposeMoreThanMaximallyAllow(
                error::ExposeMoreThanMaximallyAllowError {
                    mec: MEC,
                    ec: *ec_mut,
                },
            ));
        };
        *ec_mut += 1;
        Ok(scope(RTExposedSecret(&self.0, PhantomData)))
    }
}

Source

2 Likes