How to determine with mio::TcpStream that all data has been read?

dEajL3kA · January 13, 2023, 5:49pm

How to determine with mio::net::TcpStream that all data has been read?

I understand that that I have to register the mio::net::TcpStream with my Poll and then poll() for "read" events. Whenever a "read" event is received, I have to call TcpStream:read() in a loop until it fails with an WouldBlock error. Then, once again, I can poll() for more "read" events...

Now here is the thing: How do I know when all data has been read? Specifically, how do I know that the peer has transmitted all data, e.g. the HTTP request data has been received completely?

The problem is: After loop'ing the read() until it failed with WouldBlock error, I do not know whether the last read has failed because the peer is actually finished sending data, or because we have read all data that was currently available in the receive buffers and more data may become available later...

In other words: If, after the WouldBlock error, I go back to poll()'ing, then I don't know whether more "read" events are going to pop up, or whether poll() waits forever as there won't be any more events.

Thanks for any suggestions!

I know that I can add a timeout to poll() call. But it is unclear which timeout value would be big enough to be sure that we are not missing any input data that may arrive "delayed". Furthermore, this adds an unnecessary delay in the case that the data is already complete when we enter poll()

Also, I have looked at some mio example code. They seem to expect that there will be a successful zero size read() to indicate the end of the data. But this not the case in the real world, as I have tested!

In my test, after I get the very first "read" event for my TcpStream, I can do a single successful read(), which returns a size of (for example) 78 bytes. The next read() then immediately fails with a WouldBlock error. If, at this point, I go back to poll()'ing, then no further events are ever received for the stream...

Never ever was there a read() that succeeded with size zero.

Meanwhile, assuming that the first successful read() after the first "read" event already gave us all the data, may be wrong in the general case. So I'm a bit lost how this is supposed to be solved

sfackler · January 13, 2023, 7:37pm

How are you defining "all data"? Read will return 0 when the end of the stream has been reached (i.e. the peer has closed the connection). If the peer does not close the connection, you won't see the end of stream indicator.

dEajL3kA · January 13, 2023, 7:52pm

How are you defining "all data"? Read will return 0 when the end of the stream has been reached (i.e. the peer has closed the connection). If the peer does not close the connection, you won't see the end of stream indicator.

Well, a HTTP client (e.g. cURL) connects to the server, sends the HTTP request and then waits for the server response. Somehow the server must be able to determine when it has read all the request data from stream, so that it can start parsing the request, then create the suitable response.

With standard blocking TcpStream this is easy, as read() simply blocks until we have all data

However, with mio::net::TcpStream the read() can fail with WouldBlock error at any time. But, in this case, it is unclear whether more data will become available for reading at some point in the future, or not. I can use Poll:poll() to wait for an "even" that signals more data. But I may never happen...

Unfortunately, I never see a successful zero size read() result

It's probably because, at some point, the client (e.g. cURL) has fully sent the request, but does not yet close the connection, as it obviously is still awaiting the server's response...

sfackler · January 13, 2023, 7:53pm

The server parses the HTTP request as it reads it to determine when it has read all of it.

No, it is not. You will see exactly the same behavior with a blocking socket except the read will just block instead of returning WouldBlock.

ndusart · January 13, 2023, 7:55pm

Well that's not true. Blocking and nonblocking read will return the same data when called when input buffer contains data.

You may, in both cases, receive multiple frames ending by a potentially partial frame.

It is up to the upper layer above TCP to actually parse incoming bytes to determine if a full frame has been received or not.

dEajL3kA · January 13, 2023, 8:09pm

No, it is not. You will see exactly the same behavior with a blocking socket except the read will just block instead of returning WouldBlock.

Thing is, because the blocking TcpStream never fails "prematurely" with a WouldBlock error, it either returns the data (if already available) or it blocks until the data becomes available. So, the blocking read() returns no later than when the data has been fully received. That's much different from Poll:poll(), which may wait forever in case when no more "read" events are going to arrive

The server parses the HTTP request as it reads it to determine when it has read all of it.

But how do you know? Since the non-blocking read() from mio::net::TcpStream always returns immediately, it may return a "partial" data – at least, in the general case. Future read()s may (or may not) return more data, which we need to append to the data that we have received before.

Just by looking at the – possibly incomplete – data we already have, we can not know whether more data is going to come up, or not. HTTP does not start with a "length" field, or something in that vein...

sfackler · January 13, 2023, 8:13pm

Read would also wait forever when no more read events arrive. Both of these scenarios are exactly the same with respect to incoming data.

You absolutely can. Every HTTP implementation written in the last 25 years has successfully done this. You parse the request. The header section ends with \r\n\r\n, and either includes a Content-Length header indicating the exact byte length of the body, or includes a Transfer-Encoding: chunked header indicating that the body is chunked (which can be parsed chunk by chunk), or has no body.

ndusart · January 13, 2023, 8:16pm

TcpStream knows nothing about HTTP or whatever protocol is used above it.

Blocking read will actually block until some data is available. If buffer contains a partial frame at the time you call read, it won't block and just return received bytes.

ndusart · January 13, 2023, 8:20pm

Also, think about the fact that you must provide a buffer of limited size where read can writes bytes to. How would you know the size you have to allocate to this buffer if an HTTP frame can be arbitrarily long ?

dEajL3kA · January 13, 2023, 8:37pm

TcpStream knows nothing about HTTP or whatever protocol is used above it.

Blocking read will actually block until some data is available. If buffer contains a partial frame at the time you call read, it won't block and just return received bytes.

I understand that TCP only transmits a stream of bytes and has no idea about HTTP. But TCP is also bi-directional. Each "direction" can be closed independently. So, the client will transmit the HTTP request to the server and then close the "client → server" channel. At this point, the server can definitely know that there's no more "incoming" data pending, and that the HTTP request (or whatever else it is!) was received completely. I suppose that the blocking read() of std::net::TcpListener returns when the other side closes the "client → server" channel and all data up to that point was received.

I created a "demo" HTTP Server, based on blocking read() of std::net::TcpListener. It was tested extensively with Apache ab – 10.000 requests with a concurrency level of 10. Never, not a single time, has the blocking read() returned an incomplete request, or blocked for longer than "end of request".

Also, think about the fact that you must provide a buffer of limited size where read can writes bytes to. How would you know the size you have to allocate to this buffer if an HTTP frame can be arbitrarily long ?

I guess, in practice we have to define a buffer of "reasonable" size, and reject requests that exceed this size. I think servers rejecting HTTP requests that are too big is quite normal, even for Apache/Nginx.

dEajL3kA · January 13, 2023, 8:42pm

You absolutely can. Every HTTP implementation written in the last 25 years has successfully done this. You parse the request. The header section ends with \r\n\r\n, and either includes a Content-Length header indicating the exact byte length of the body, or includes a Transfer-Encoding: chunked header indicating that the body is chunked (which can be parsed chunk by chunk), or has no body.

I see! But this makes things a lot more complex. Especially since even the \r\n\r\n end marker could – in theory – be split across two "chunks" of data from separate non-blocking read()'s

I don't want to sound like a broken record, but reading the whole request in one go, as it has always worked for me with the blocking read() method, simplifies these things quite a lot...

sfackler · January 13, 2023, 8:55pm

That is how HTTP worked in the 0.9/1.0 days over 25 years ago. That is not the case in the modern world where connections are reused for multiple requests.

Fortunately, people have already written libraries that correctly implement HTTP semantics and hide its complexity from you.

If that ever worked, it is either because your client code was written as if it was 1995 or because you got lucky. If your client is closing the sending channel and you're not seeing mio unblocking, then I would assume you are misconfiguring mio in some way.

ndusart · January 13, 2023, 9:12pm

You should then test it with a client you implement. Just split the request in multiple write with sleeps in between. You will see that your blocking read will return with partial requests.

ndusart · January 13, 2023, 9:20pm

@dEajL3kA if blocking read worked the way you describe, it would be impossible to implement streaming protocols above it (like telnet or part of SSH), as it would block indefinitely until some sort of "end of request" happen, which may never occurs.

system · April 13, 2023, 9:20pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Checking for incoming data on TcpStream help	23	9631	October 19, 2021
Why mio::Poll::poll() only receives the very first event? help	5	446	April 13, 2023
Mio : TcpStream read/write on different thread help	8	3018	January 12, 2023
How to know when std::net::TcpStream ends help	3	294	February 14, 2023
Event on TcpStream? help	3	1128	January 12, 2023

How to determine with mio::TcpStream that all data has been read?

Related Topics