I've been working on some bindings from C to rustls, to be used by curl and Apache. I'm interested in feedback on the library design generally: https://github.com/abetterinternet/crustls.
Specifically I have to catch panics and return something (because panics crossing the FFI boundary is UB). For many functions, I return a result enum and can return an error. For functions that are infallible (except for panics), I'm returning default values after recovering from a panic. Is that sensible? Should I make every function fallible in order to account for the possibility of panicking? That seems like bad ergonomics for the users of the library.
Returning some sort of default value or "useless" value is probably the way to go.
For things like rustls_client_config_builder_new I'd return a null pointer. When calling a function which returns a newly allocated object it's common to do a null check to detect OOM, so the caller should already be handling the possibility of failure.
It seems reasonable to return false for rustls_client_session_wants_read and friends.
How much do you want to depend on rustls internals? For a lot of these, you can look at the source code and figure out whether a particular function may panic. For example, rustls_client_session_wants_read will probably just be reading a field on a struct, in which case you know it'll never panic and can skip the catch_unwind altogether.
I know curl is known for its reliability, so you may also want to look into some sort of poisoning mechanism (e.g. by setting a has_panicked flag in catch_unwind and starting functions with if self.has_panicked { /* bail */ }). The idea is if you encounter a panic while calling a method on an object, there is a possibility that some logical invariants have been broken and this may cause logic bugs or trigger further panics later on. By marking the object as poisoned you take the conservative approach and neuter it to prevent any further issues.
For example, imagine you have a Vec<T> that must be sorted, and when adding a new element you append it then calling sort()... if a comparison panics midway through the sort() call, the next time you try to use it you'll find an unsorted list when you expect it to be sorted.
I think any instance in which you can get Rustls to panic would be considered a bug in Rustls and would be fixed promptly once reported. Maybe there are some situations where misuse of the Rustls API (programming mistakes within crustls) might cause panics, but even then I wouldn't be surprised if Rustls would consider those panics to be bugs.
So, basically, the problem you are worried about "should" never happen, and IMO it would be better to spend effort ensuring it never happens within Rustls than to worry about what to do when it happens.
I know that the policy for untrusted, webpki, and ring, dependencies of Rustls, is to never panic.
If Rustls can guarantee that it will never panic, then it would nicely side-step the problem. @jsha will probably be fuzzing the bindings quite thoroughly anyway, so that would give Curl more confidence in Rustls.
This is a good point, @briansmith. I am indeed assuming it's the policy of rustls to never panic. But I think it's worth covering the possibility of a bug, given the cost is fairly low. And the consequences of failing to catch the unwind would be UB.
Also, to meet curl's needs, I need to prevent aborts on OOM. My understanding is that one planned way to avoid aborting on OOM is to turn OOMs into unwind (though I still need to do some more research there).
I'm fine depending on knowledge of rustls internals. I'm a little dubious that I can prove to myself from the source code that it will never panic, but I am very open to learning that it's more tractable that I thought.
There are also a handful of unwrap() throughout the code. I checked out a few and they all seem truly infallible (e.g. time::SystemTime::now().duration_since(time::UNIX_EPOCH).unwrap()), but some require nonlocal reasoning to confirm that.
The ring randomness API avoid panicking/aborting on failure, and since it is often/usually implemented in terms of fallible operating system APIs, it returns a Result that has to be handled. I do think this is a bug in Rustls, and I think it is worth a PR to fix. (I believe the Rustls thinking is similar to the Google thinking. In BoringSSL and Fucshia, the process will be aborted on CSRNG failure.)
Somebody should just make some PRs to Rustls change the unwraps to more conservative error handling.
I agree with the idea of wanting to be conservative about catching aborts before they cross the FFI boundary. However, you should also then audit all your dependencies to ensure they are actually unwind-safe, and be conservative in your usage of your dependencies to minimize the extent to which you require them to be unwind-safe. For example, if you were to catch a panic in your C FFI layer around Rustls, I would try to throw away as much Rustls-related state (sessions, connections, etc.) as possible.