How to determine with mio::TcpStream that all data has been read?

How to determine with mio::net::TcpStream that all data has been read?

I understand that that I have to register the mio::net::TcpStream with my Poll and then poll() for "read" events. Whenever a "read" event is received, I have to call TcpStream:read() in a loop until it fails with an WouldBlock error. Then, once again, I can poll() for more "read" events...

Now here is the thing: How do I know when all data has been read? Specifically, how do I know that the peer has transmitted all data, e.g. the HTTP request data has been received completely?

The problem is: After loop'ing the read() until it failed with WouldBlock error, I do not know whether the last read has failed because the peer is actually finished sending data, or because we have read all data that was currently available in the receive buffers and more data may become available later...

In other words: If, after the WouldBlock error, I go back to poll()'ing, then I don't know whether more "read" events are going to pop up, or whether poll() waits forever as there won't be any more events.

Thanks for any suggestions!


I know that I can add a timeout to poll() call. But it is unclear which timeout value would be big enough to be sure that we are not missing any input data that may arrive "delayed". Furthermore, this adds an unnecessary delay in the case that the data is already complete when we enter poll() :weary:


Also, I have looked at some mio example code. They seem to expect that there will be a successful zero size read() to indicate the end of the data. But this not the case in the real world, as I have tested!

In my test, after I get the very first "read" event for my TcpStream, I can do a single successful read(), which returns a size of (for example) 78 bytes. The next read() then immediately fails with a WouldBlock error. If, at this point, I go back to poll()'ing, then no further events are ever received for the stream...

Never ever was there a read() that succeeded with size zero.

Meanwhile, assuming that the first successful read() after the first "read" event already gave us all the data, may be wrong in the general case. So I'm a bit lost how this is supposed to be solved :thinking:

1 Like

How are you defining "all data"? Read will return 0 when the end of the stream has been reached (i.e. the peer has closed the connection). If the peer does not close the connection, you won't see the end of stream indicator.

How are you defining "all data"? Read will return 0 when the end of the stream has been reached (i.e. the peer has closed the connection). If the peer does not close the connection, you won't see the end of stream indicator.

Well, a HTTP client (e.g. cURL) connects to the server, sends the HTTP request and then waits for the server response. Somehow the server must be able to determine when it has read all the request data from stream, so that it can start parsing the request, then create the suitable response.

With standard blocking TcpStream this is easy, as read() simply blocks until we have all data :slight_smile:

However, with mio::net::TcpStream the read() can fail with WouldBlock error at any time. But, in this case, it is unclear whether more data will become available for reading at some point in the future, or not. I can use Poll:poll() to wait for an "even" that signals more data. But I may never happen...

Unfortunately, I never see a successful zero size read() result :thinking:

It's probably because, at some point, the client (e.g. cURL) has fully sent the request, but does not yet close the connection, as it obviously is still awaiting the server's response...

The server parses the HTTP request as it reads it to determine when it has read all of it.

No, it is not. You will see exactly the same behavior with a blocking socket except the read will just block instead of returning WouldBlock.

Well that's not true. Blocking and nonblocking read will return the same data when called when input buffer contains data.

You may, in both cases, receive multiple frames ending by a potentially partial frame.

It is up to the upper layer above TCP to actually parse incoming bytes to determine if a full frame has been received or not.

No, it is not. You will see exactly the same behavior with a blocking socket except the read will just block instead of returning WouldBlock.

Thing is, because the blocking TcpStream never fails "prematurely" with a WouldBlock error, it either returns the data (if already available) or it blocks until the data becomes available. So, the blocking read() returns no later than when the data has been fully received. That's much different from Poll:poll(), which may wait forever in case when no more "read" events are going to arrive :face_with_raised_eyebrow:

The server parses the HTTP request as it reads it to determine when it has read all of it.

But how do you know? Since the non-blocking read() from mio::net::TcpStream always returns immediately, it may return a "partial" data – at least, in the general case. Future read()s may (or may not) return more data, which we need to append to the data that we have received before.

Just by looking at the – possibly incomplete – data we already have, we can not know whether more data is going to come up, or not. HTTP does not start with a "length" field, or something in that vein...

Read would also wait forever when no more read events arrive. Both of these scenarios are exactly the same with respect to incoming data.

You absolutely can. Every HTTP implementation written in the last 25 years has successfully done this. You parse the request. The header section ends with \r\n\r\n, and either includes a Content-Length header indicating the exact byte length of the body, or includes a Transfer-Encoding: chunked header indicating that the body is chunked (which can be parsed chunk by chunk), or has no body.

1 Like

TcpStream knows nothing about HTTP or whatever protocol is used above it.

Blocking read will actually block until some data is available. If buffer contains a partial frame at the time you call read, it won't block and just return received bytes.

Also, think about the fact that you must provide a buffer of limited size where read can writes bytes to. How would you know the size you have to allocate to this buffer if an HTTP frame can be arbitrarily long ?

TcpStream knows nothing about HTTP or whatever protocol is used above it.

Blocking read will actually block until some data is available. If buffer contains a partial frame at the time you call read, it won't block and just return received bytes.

I understand that TCP only transmits a stream of bytes and has no idea about HTTP. But TCP is also bi-directional. Each "direction" can be closed independently. So, the client will transmit the HTTP request to the server and then close the "client → server" channel. At this point, the server can definitely know that there's no more "incoming" data pending, and that the HTTP request (or whatever else it is!) was received completely. I suppose that the blocking read() of std::net::TcpListener returns when the other side closes the "client → server" channel and all data up to that point was received.

I created a "demo" HTTP Server, based on blocking read() of std::net::TcpListener. It was tested extensively with Apache ab – 10.000 requests with a concurrency level of 10. Never, not a single time, has the blocking read() returned an incomplete request, or blocked for longer than "end of request".

Also, think about the fact that you must provide a buffer of limited size where read can writes bytes to. How would you know the size you have to allocate to this buffer if an HTTP frame can be arbitrarily long ?

I guess, in practice we have to define a buffer of "reasonable" size, and reject requests that exceed this size. I think servers rejecting HTTP requests that are too big is quite normal, even for Apache/Nginx.

You absolutely can. Every HTTP implementation written in the last 25 years has successfully done this. You parse the request. The header section ends with \r\n\r\n, and either includes a Content-Length header indicating the exact byte length of the body, or includes a Transfer-Encoding: chunked header indicating that the body is chunked (which can be parsed chunk by chunk), or has no body.

I see! But this makes things a lot more complex. Especially since even the \r\n\r\n end marker could – in theory – be split across two "chunks" of data from separate non-blocking read()'s :exploding_head:

I don't want to sound like a broken record, but reading the whole request in one go, as it has always worked for me with the blocking read() method, simplifies these things quite a lot...

That is how HTTP worked in the 0.9/1.0 days over 25 years ago. That is not the case in the modern world where connections are reused for multiple requests.

Fortunately, people have already written libraries that correctly implement HTTP semantics and hide its complexity from you.

If that ever worked, it is either because your client code was written as if it was 1995 or because you got lucky. If your client is closing the sending channel and you're not seeing mio unblocking, then I would assume you are misconfiguring mio in some way.

1 Like

You should then test it with a client you implement. Just split the request in multiple write with sleeps in between. You will see that your blocking read will return with partial requests.

@dEajL3kA if blocking read worked the way you describe, it would be impossible to implement streaming protocols above it (like telnet or part of SSH), as it would block indefinitely until some sort of "end of request" happen, which may never occurs.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.