How to recover future chain from panic! (need to close security vulnerability for http server)?


#1

I am trying to find a way to recover future chain from panic! and return positive future result (but with negative http status code).

Let’s consider the example below. The process is crashed if a client submits Json structure not deserializable to the Point (done intentionally this way). Note: unwrap is used intentionally for the purpose of the question (real code would handle the json parsing result accordingly). My goal is to report InternalServerError HTTP status code back to clients causing panics (for whatever reason) instead of crashing the whole process, which affects all good behaving clients.

What is the rust way of doing this?

#[derive(Serialize, Deserialize, Debug)]
struct Point {
    x: i32,
    y: i32,
}

struct AgentService;

impl AgentService {
    fn get_body(&self, body: hyper::Body) -> Box<Future<Item=String, Error=<Self as hyper::client::Service>::Error>> {
        /// skip not-related details
    }
}

impl hyper::server::Service for AgentService {
    type Request = Request;
    type Response = Response;
    type Error = hyper::Error;
    type Future = Box<Future<Item=Self::Response, Error=Self::Error>>;

    fn call(&self, _req: Request) -> Box<Future<Item=Self::Response, Error=Self::Error>> {
            println!("Received {:?}", _req);
            let resp: Box<Future<Item=Self::Response, Error=Self::Error>> = match _req.path() {
                "/avl" if _req.method() == &hyper::Method::Post => {
                    Box::new(self.get_body(_req.body())
                        .map(|r| {
                            // panics if json can not be unpacked to the Point
                            // (let's consider it is oversight in the program that the parsing result is not handled explicitly)
                            let maybe_point: Point = serde_json::from_str(&r).unwrap();
                            Response::new().with_status(hyper::StatusCode::Created)
                        }))
                }
                _ => {
                    Box::new(futures::future::ok(Response::new()
                        .with_status(hyper::StatusCode::NotFound)))
                }
            };

            Box::new(resp.or_else(|r| {
                // recover from an explicit error returned from the future chain
                // How can I achieve the following?
                // TODO if panic happened, return hyper::StatusCode::InternalServerError instead
                Ok(Response::new().with_status(hyper::StatusCode::BadRequest))
            }))
    }
}

What does Rust need today for server workloads?
What does Rust need today for server workloads?
#2

https://docs.rs/futures/0.1.17/futures/future/trait.Future.html#method.catch_unwind. But to be clear - this should be used sparingly and you should try your hardest to avoid panics in Rust for cases where you intend to handle them.

Note also that panics can be turned into aborts via compilation flag in which case this code wouldn’t execute.


#3

I have seen catch_unwind, but I can not figure out how to use it to implement my call function: the contract for the call function is to return future of response. And I want my response to be with status InternalServerError if my code panics somewhere. Can I do it in the contract of the call function? Note: this contract is defined by the hyper framework.

of course I should, but I can not 100% guarantee. Can somebody do 100% guarantee? But I would like to guarantee, at least, that the process is not crashed where it was an oversight and panic still happened (although I tried very hard to avoid it). Minority of clients causing panics, will be isolated by InternalServerError response. And good clients will be served as before. Otherwise, the crash of the process becomes huge security vulnerability! Please, help me to close this security vulnerability :slight_smile:


#4

CatchUnwind is itself a Future: https://docs.rs/futures/0.1.17/src/futures/future/catch_unwind.rs.html#24-42. So you wrap your resp future with it, ie resp.catch_unwind().map_err(|_| ...). The Error of this future is the panic error - the normal Item/Error duo of the underlying future is returned as the Item of this future.

I don’t know about 100% but with enough care and diligence you should be able to get close in Rust. Panics are usually callled our prominently in APIs and if you avoid unwraps/expects and so on, there shouldn’t be too many panic points in well behaved and written Rust code/libs.

I can’t imagine crashing being a security problem. Is it a possible DoS vector? Yes, but not sure about security. Quite contrary, if you decide to stay up after a panic then you risk running into security issues or data corruption. Rust checks for UnwindSafe but not sure if you really want to trust that fully. Read up on “unwind safety” to get more color on the topic.


#5

Also, I believe you’re coming from a Java/Scala background. Not sure about Scala, but most Java code is not robust in the presence of rare error events. Part of that is culture (I think) and part of that is the way exceptions are used there (checked and unchecked). It’s also very common to have a “top level” exception handler that just logs and continues on its merry way. This unprincipled approach to error handling leads to all sorts of problems: instability, puzzling behavior, errors at a distance, state/data corruption and so on. In Rust, I think there’s better discipline and tools to make error handling more precise, targeted, and principled. Try to follow suit :slight_smile:

Should also say that error handling in general is hard - Rust makes it better but it’s not a cakewalk. But proper and carefully planned error handling is what separates first class products from the rest.


#6

Well, if a process can be crashed, it is an attack on availability (which can be in the scope of security depending on the scope definition for your company / service). Yes, I am aware about high-availability, distribution, load balancing, etc. but knowing how to crash 1 instance, can be used to crash all distributed instances simultaneously. And as a result, an attacker can put the whole service down, which compromises Service-Level-Agreements, which have got financial penalties. It is much easier situation, when you can recover from panics, allowing to keep the process alive, so majority of the traffic is not affected, while you are investigating and fixing the cause of the panic (which you were not aware before).


#7

Yes, I am coming from Scala and also C++ worlds. Yes, global try/catch is not the best pattern and you should design you code so that it does not hit catch of the global try/catch block in any conditions. I agree with it. However, as nobody can guarantee global catch does not happen, it is a good fallback mechanism, which allows to win time for observing the cases of unexpected traps and fixing it. It allows to sustain the process running in case of unexpected / unknown crashes instead of having everything down due to someone single found a way to crash it.

This is great that there is better discipline. I 100% support it. But the safety bed (as a last resort to sustain the service running) is paramount for high-availability mission critical publicly exposed services. If rust does not provide this safety bed or does not provide 100% guarantee that panics are impossible within software flow (i.e. all errors are handled), it is unlikely it can be a choice or alternative for Java/Scala, despite of all Rust community intent “to follow suit”.


#8

I thought that catch_unwind should transform propagation of a panic to a completed with an error future. This is what I put (I assume catch_unwind does nothing if there is no panic):

            ....
            // redefine resp with catching panic
            let resp = resp.catch_unwind();

            Box::new(resp.or_else(|r| {
                // recover from an explicit error from the future chain
                Ok(Response::new().with_status(hyper::StatusCode::BadRequest))
            }))

I would expect it to compile transparently, but it does not. Could you please let me know what I am doing wrong with the catch_unwind?

error:

error[E0277]: the trait bound `futures::Future<Item=hyper::Response, Error=hyper::Error>: std::panic::UnwindSafe` is not satisfied
  --> src\server.rs:94:29
   |
94 |             let resp = resp.catch_unwind();
   |                             ^^^^^^^^^^^^ the type futures::Future<Item=hyper::Response, Error=hyper::Error> may not be safely transferred across an unwind boundary
   |
   = help: the trait `std::panic::UnwindSafe` is not implemented for `futures::Future<Item=hyper::Response, Error=hyper::Error>`
   = note: required because of the requirements on the impl of `std::panic::UnwindSafe` for `std::ptr::Unique<futures::Future<Item=hyper::Response, Error=hyper::Error>>`
   = note: required because it appears within the type `std::boxed::Box<futures::Future<Item=hyper::Response, Error=hyper::Error>>`


#9

Ah, you may need to catch_unwind the individual concrete futures that are used to form the Box for the resp binding. The way this works in Rust is you can call catch_unwind only if the closure is marked UnwindSafe - this is a marker trait (like Send and Sync) that is auto implemented on types that meet certain criteria (check the docs for UnwindSafe for specifics). This is basically a lint to make sure you’re using types that don’t potentially end up with broken invariants if a panic starts unwinding. You can also manually assert this by using the AssertUnwindSafe wrapper struct, if need be, but it’s not transparent - this struct is visible as a type (although it can be deref’d to the underlying value).