Is downgrading a panic to an error a breaking change?

The SemVer Compatibility section of the Cargo Book doesn't seem to address this (I guess since it doesn't change the function type). I would expect that users should not rely on code panicking, but I might be wrong, so seeking for the community opinion.

A concrete example for illustration:

/// Authenticates and decrypts a cyphertext.
///
/// # Panics
///
/// Panics if `cyphertext.len() < 4` or `plaintext.len() < cyphertext.len() - 4`.
///
/// # Errors
///
/// Returns an error if the cyphertext is not authentic.
pub fn decrypt(secret: &[u8], cyphertext: &[u8], plaintext: &mut [u8]) -> Result<(), DecryptError>;

Is the new version below compatible with the old one above?

/// Authenticates and decrypts a cyphertext.
///
/// # Errors
///
/// Returns an error in the following conditions:
/// - `cyphertext.len() < 4`
/// - `plaintext.len() < cyphertext.len() - 4`
/// - the cyphertext is not authentic
pub fn decrypt(secret: &[u8], cyphertext: &[u8], plaintext: &mut [u8]) -> Result<(), DecryptError>;

I don’t think there are any really rigorous principles to be had here, but more important principles than not relying on panic behavior are that:

  • types should have only valid values, and
  • the return value of a function should be truthful.

So, if a DecryptError previously represented only “the cyphertext is not authentic”, how does it now also represent “the length is invalid”? If DecryptError was (or contained) a non_exhaustive enum, then there is an obvious possibility of adding a variant to it. But if it is not, then decrypt is now returning false information.

In general, you should consciously choose whether the set of errors returned by a function is extensible. If it is not, then you must not return errors for new cases unless the new cases fall into the existing description of the error, because the caller might be handling the error with an assumption about what it means. If it is extensible, then turning a panic into an error is IMO usually reasonable, but I would want to be more careful with the original documentation and write that it “May panic if…”.

Even more broadly: when you are doing a “1.0” release, or any one where you care about not having breaking changes in your near future, you should review your API surface — both types and documentation — for over-promising the details of things that you might want to tweak later, and remove such promises.

4 Likes

I'm not sure what you mean by that. The closest interpretation I can come up with (namely that an API shouldn't accept or return invalid values in the sense of the validity invariant) would be tautological, since otherwise you would have language UB. So you must mean something else.

I didn't mention that because it's orthogonal to the problem (in my case the error enum is shared with many functions and already contains a variant for invalid length). However, reading through the lines, I guess what you mean is that users may actually rely on code panicking, otherwise they wouldn't be able to observe the new error path. So that would be leaning towards panicking documentation being "invariant", in particular it can't get weakened as the safety documentation can. That's indeed the conservative answer. I'm gonna wait a bit to see if there are other opinions.

That's fair; I am not sure what the right word here is (though I think you’re applying that particular sense of “valid” outside of the context where it really applies — valid in the sense of “validating” a type’s correctness-not-just-safety invariants is a perfectly normal usage), and maybe it makes sense only as part of the next point (“the return value of a function should be truthful”).

I mean: You should not create a ParseError { line: 1, column: 9999 } (that is visible to callers) if the input’s line 1 is not 10000 characters long. Similarly, you should not create a DecryptError if it is defined to mean only “the cyphertext is not authentic” and the problem is not that the cyphertext is not authentic.

Take your own documentation’s promises seriously. Avoid making them if you don’t mean them.

Don’t think of it as “rely on panicking”. Think of it as the opposite “rely on what is promised to be true if we reach this point” (i.e. don't panic and don't otherwise branch elsewhere). Consider:

fn dependent_code() {
    match library_of_interest::func() {
        Enum::A => todo!("point A"),
        Enum::B => todo!("point B"),
    }
}

Clearly, if point A is reached, then that means something to dependent_code, and you shouldn't return Enum::A if you don’t mean those things. Similarly for point B. All I’m saying is that there are implications to reaching A or B, and that’s part of your public API. You can greatly weaken these implications by making Enum be non_exhaustive, and if you do so, your documentation for func should also reflect your intent.


All that said, I think there’s a lot that is under-studied about how to design APIs (and languages) for future evolution, and how to write reasonable (non-)guarantees — in Rust and everywhere else too.

1 Like

While I agree with that in general, I think that if the panic is as specifically documented as that then it's definitely a breaking change to stop panicking in those cases.

If it happened to panic on those cases, but it wasn't documented as such, then you could probably say it's compatible in the "that was a bug and we're allowed to fix bugs" loophole (after all, every bug fix is a breaking change under the strictest possible definition), but arguably when it's documented it's promised behaviour, not a bug.

1 Like

I had to think about your question for a while.

Semver - and versioning in general - is a set of rules about intentions, even though we have a lot of practice around it as a set of rules about software. One of the key intentions expressed through semver is that if two releases of a piece of software share a major version number, then other software that behaves correctly when using the earlier release, which uses that software in its intended ways, will continue to behave correctly when using the later release.

Which brings me to this section of your example.

As documented, one thing it might be reasonable for a downstream developer to rely on is that the presence of an error means only and exactly that the ciphertext is authentic. I might argue as a matter of style that the code in question should examine the DecryptError value to verify that, but unless DecryptError is #[non_exhaustive], it would be valid to assume that the only possible value it has is DecryptError::InauthenticCiphertext (however it's named).

This change is not compatible with that assumption. Code written - in questionable style - that makes the assumption that the error can only mean one thing will not behave correctly when it encounters an error that means one of the two new things it can now mean.

This isn't just an abstract concern. These kinds of assumptions can propagate outwards, potentially so far as to tell the end user what the error "is" based on that assumption. An error message that shows an "invalid ciphertext" message may now be incorrect if it is triggered by a programming error that leads to incorrect lengths, for example, and the user may not have the tools to troubleshoot it.

I don't think this necessarily requires a major version change, but it could. A lot depends on whether the hypothetical client code I described above is or is not using this library "as intended" - or, put another way, whether this is a kind of backwards compatibility the authors intend to support. A lot could be inferred from the presence or absence of a #[non_exhaustive] marker on the error type, for example, as that would communicate the intentions of the developers with respect to adding values to the error enumeration later on.

However, I would err on the side of caution in most cases, and reserve this change for the next major version, unless there is a strong user community request for it in the current version and little risk of the technical compatibility risk turning into an actual problem.

1 Like

Thanks for the answers! I think this solves my problem. Let me summarize the conclusion.

The Panics section documents the function behavior. It doesn't document conditions under which the function should not be called, but would still have a reasonable behavior if it was called. This is opposed to the Safety section which documents when the function should not be called, and would not have a reasonable behavior if it was called. Because of this distinction, the Safety section can be weakened, but the Panics section cannot.

If one wants a safe function with conditions under which it should not be called, it should explicitly mention that it is a user error to call the function under those conditions, but that if by accident it were to happen, the behavior is safe but otherwise unspecified (for example a panic). Only then the behavior can be changed (as long as it remains safe), because it was documented so.

One can see those conditions as safe UBs and they probably deserve a separate section between Panics and Safety (maybe Unspecified, Undefined, Unstable, Invalid, ... I'll need to think about it).

1 Like

I think this is too weak — in an error case, it is desirable to guarantee that the function will not return Ok. There are cases where misbehavior-without-error (“safe UB”) is unavoidable without compromising desirable properties — for example, sorting or hashing with an inconsistent comparison or hash function — but in the kind of cases you started the discussion with, I think we can and should say something stronger and more useful to rely on. Exactly what would be best, I’m not sure. I can see how having a dedicated section would be elegant, but I think it’s also reasonable to just say “Currently panics if X; this may change.”

My expectations would be more in line with the classic xkcd comic:

Within this context, I agree with @derspiny:

With the caveat that if you have a significant number of users, telling them they are holding the tool wrong will be cold comfort if their code has broken.

That's actually the whole point. You're doing the same mistake as I did by trying to be helpful and provide a strong specification. What if in the future, you realize that users sometimes only want to decrypt a prefix of the cyphertext, but they have to provide the whole cyphertext because the tag is at the end, and they don't want to allocate a too big buffer for the plaintext. Then you'll be happy you can actually return Ok and just decrypt a prefix.

The idea behind the Unspecified section is exactly the same as Safety, except that it's safe. The idea being to specify conditions that are outside the function contract. The function should not be called under those conditions. If it is, you either get UB (with Safety) or something safe (with Unspecified). This brings to your next point: it is safe to delete files.

That's indeed a good alternative. Probably something along those lines would match what I want: "This function should not be called if X. To be safe, this function will currently panic if that's the case. This behavior may change with a minor bump." That way, users who can't be sure they won't accidentally call the function outside its contract, would restrict their dependency version to the next minor version instead of the next major one.

That's Hyrum's Law. I don't have any issue breaking users that don't read the documentation. This is also why I won't break users that read the documentation. That's a contract. If you write unsafe code, this kind of things is extremely important.