Advice for HTTP Lib

I'm writing an HTTP client library for a no_std environment (can't use hyper there). I have one question about how to handle incoming data. Let's say the user of my lib makes a request and we get N bytes of body in the response (assume that response headers are already parsed). The user does not read the body but makes a second request. At this point we have those N stray bytes sitting on the TCP connection. For the second request, I would want to clear those bytes so they don't interfere with the second request's response. One way to do so is to drain them before you write the second request to the connection by reading until EOF. But that may not be a good idea because the body in some cases could be a few GBs. If you start draining, you'll end pulling all those GBs of data and sitting there for a long time. The other technique I can think of is drain a few KB. If that empties the socket then proceed with the second request. If not, then close the connection and start a new connection.

I have never written an HTTP implementation. So just wanted to check with the experts here if this is the right way or if there's a better technique.

@seanmonstar Can probably provide a quick answer to this.

Thanks :slight_smile:

As a quick check, I'd look at the Content-Length header of the response. If it's over your limit, close the connection. Otherwise, start draining and if you exceed the length then close the connection at that point.

I'm curious though - what's the scenario where a server responds to your request with a body that's larger than you're willing to receive?

Thanks. That confirms what I was thinking. Content-Length check is a good tip :).

I have no real upper bound on the size of the body I'm willing to receive. This was was only about handling a certain scenario that can arise if you're the author of an HTTP client lib and the user of this library makes a request, does not read the incoming body, or reads it only partially, and then calls the client lib with a second request. In that case, I was wondering, how the author of the library can satisfy the second request without corrupting the response with the residual bytes from the previous response.

FWIW, a response with Content-Encoding: chunked will not have a content-length to check. So you’ll still be vulnerable to the same issue if you don’t close the connection on chunked responses as well.

It is actually the application’s responsibility to read the body, even if it needs to be discarded. Having a library do this would probably be bad news for multiple reasons. One use case off the top of my head is HTTP/1.1 pipelining with async requests. It suffers head-of-line blocking as you alluded to, but sometimes it’s more efficient than sequential request/response.

What you describe sounds reasonable. There's some edge cases in RFC7230 about how to determine response body lengths.

Knowing whether to drain unread bytes from the connection can be tricky, and should probably be left to the user. For that reason, hyper has this stance: if the user reads the full body, then a new request can re-use the connection. If the user hasn't read the body, then the connection is discarded and a new one is used.

@seanmonstar Thanks for the RFC link. For now I do not plan to support Transfer-Encoding, so currently only Content-Length will be considered as the marker of length (that plus reading until connection close if Content-Length is missing). Will add support for it later.

Coming back to the immediate question, I agree with you. Perhaps the best option is to always close and re-open the connection under this scenario without any attempt to drain at all. It's just a whole lot simpler (I assume that's what you meant).

@parasyte The above mentioned approach of always closing the connection should obviate the need to check the length markers. So I don't need to worry even with Transfer-Encoding or Content-Encoding headers (although I said I don't plan it support them, the server could still send them). Pipe-lining is a very good argument against draining as well as closing connection. So, one option is not to use any remedial measures like this when pipe-lining is enabled on the library but use them otherwise. But that said pipe-lining is kinda dead these days, so I should not even be supporting it I guess. Can't think of any other argument against closing the connection (and not draining) in this situation.

Thanks for the help everyone :slight_smile: