Feedback on C-to-Rustls bindings

Hi all,

I've been working on some bindings from C to rustls, to be used by curl and Apache. I'm interested in feedback on the library design generally: https://github.com/abetterinternet/crustls.

I'm also particularly interested in recommendations on how to handle panics: https://github.com/abetterinternet/crustls/pull/25.

Specifically I have to catch panics and return something (because panics crossing the FFI boundary is UB). For many functions, I return a result enum and can return an error. For functions that are infallible (except for panics), I'm returning default values after recovering from a panic. Is that sensible? Should I make every function fallible in order to account for the possibility of panicking? That seems like bad ergonomics for the users of the library.

You could abort instead of returning a default value, but that seems bad for reliability.

What sort of default values are you talking about? What functions can't return an error?

1 Like

Aborting is off the table. Curl in particular has a philosophy that libraries should never cause an abort, which makes sense to me.

What sort of default values are you talking about? What functions can't return an error?

rustls_client_session_wants_read -> bool
rustls_client_session_wants_write -> bool
rustls_client_session_wants_is_handshaking -> bool

rustls_version -> size_t (the length of the bytes written)

rustls_client_config_builder_new -> *mut rustls_client_config_builder
rustls_client_config_builder_build -> *rustls_client_config

Hmm, do you know when the functions can panic and handle each of those cases? Or do you call a dependency?

You might also be interested in https://github.com/dtolnay/no-panic.

1 Like

Returning some sort of default value or "useless" value is probably the way to go.

For things like rustls_client_config_builder_new I'd return a null pointer. When calling a function which returns a newly allocated object it's common to do a null check to detect OOM, so the caller should already be handling the possibility of failure.

It seems reasonable to return false for rustls_client_session_wants_read and friends.

How much do you want to depend on rustls internals? For a lot of these, you can look at the source code and figure out whether a particular function may panic. For example, rustls_client_session_wants_read will probably just be reading a field on a struct, in which case you know it'll never panic and can skip the catch_unwind altogether.

I know curl is known for its reliability, so you may also want to look into some sort of poisoning mechanism (e.g. by setting a has_panicked flag in catch_unwind and starting functions with if self.has_panicked { /* bail */ }). The idea is if you encounter a panic while calling a method on an object, there is a possibility that some logical invariants have been broken and this may cause logic bugs or trigger further panics later on. By marking the object as poisoned you take the conservative approach and neuter it to prevent any further issues.

For example, imagine you have a Vec<T> that must be sorted, and when adding a new element you append it then calling sort()... if a comparison panics midway through the sort() call, the next time you try to use it you'll find an unsorted list when you expect it to be sorted.

1 Like

I think any instance in which you can get Rustls to panic would be considered a bug in Rustls and would be fixed promptly once reported. Maybe there are some situations where misuse of the Rustls API (programming mistakes within crustls) might cause panics, but even then I wouldn't be surprised if Rustls would consider those panics to be bugs.

So, basically, the problem you are worried about "should" never happen, and IMO it would be better to spend effort ensuring it never happens within Rustls than to worry about what to do when it happens.

I know that the policy for untrusted, webpki, and ring, dependencies of Rustls, is to never panic.

1 Like

If Rustls can guarantee that it will never panic, then it would nicely side-step the problem. @jsha will probably be fuzzing the bindings quite thoroughly anyway, so that would give Curl more confidence in Rustls.

1 Like

This is a good point, @briansmith. I am indeed assuming it's the policy of rustls to never panic. But I think it's worth covering the possibility of a bug, given the cost is fairly low. And the consequences of failing to catch the unwind would be UB.

Also, to meet curl's needs, I need to prevent aborts on OOM. My understanding is that one planned way to avoid aborting on OOM is to turn OOMs into unwind (though I still need to do some more research there).

I'm fine depending on knowledge of rustls internals. I'm a little dubious that I can prove to myself from the source code that it will never panic, but I am very open to learning that it's more tractable that I thought.

Actually, I did find one spot where rustls might panic, on failure to get random: https://github.com/ctz/rustls/blob/a49f1117cad9082f1d0ccfd2ff45d3a336462f76/rustls/src/rand.rs#L2-L4. But that could probably be changed.

There are also a handful of unwrap() throughout the code. I checked out a few and they all seem truly infallible (e.g. time::SystemTime::now().duration_since(time::UNIX_EPOCH).unwrap()), but some require nonlocal reasoning to confirm that.

The ring randomness API avoid panicking/aborting on failure, and since it is often/usually implemented in terms of fallible operating system APIs, it returns a Result that has to be handled. I do think this is a bug in Rustls, and I think it is worth a PR to fix. (I believe the Rustls thinking is similar to the Google thinking. In BoringSSL and Fucshia, the process will be aborted on CSRNG failure.)

Somebody should just make some PRs to Rustls change the unwraps to more conservative error handling.

I agree with the idea of wanting to be conservative about catching aborts before they cross the FFI boundary. However, you should also then audit all your dependencies to ensure they are actually unwind-safe, and be conservative in your usage of your dependencies to minimize the extent to which you require them to be unwind-safe. For example, if you were to catch a panic in your C FFI layer around Rustls, I would try to throw away as much Rustls-related state (sessions, connections, etc.) as possible.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.