Strategies of dealing with unknown length data commig through TCPStream

I have the following code:

async fn read_from_socket(socket: tokio::net::TcpStream) {
    
let mut buf = Vec::with_capacity(256);

        
        match socket.try_read_buf(&mut buf) {
            Ok(bytes_read) => {
			//bytes_read indicate how many bytes where read, but how to know how many is still to be received from that socket, if any?
			}
			Err(error) => {
                //Some error, break
                break;
            }
			}
}

What is the standard way of dealing with following scenario:
We don't know if we read all the data from the socket and we don't know how much if any will be read from that socket?

Depends on the protocol you're sending over TCP. If it conveys/implies length information or has message boundaries then use that. If you're just shoveling a byte stream over a socket (such as using netcat) then you need to read until EOF (end of file) is reached. Depending on its intended use setting some limits to avoid resource exhaustion attacks may also be necessary.

If the return value of this method is Ok(n), then implementations must guarantee that 0 <= n <= buf.len(). A nonzero n value indicates that the buffer buf has been filled in with n bytes of data from this source. If n is 0, then it can indicate one of two scenarios:

  1. This reader has reached its “end of file” and will likely no longer be able to produce bytes. Note that this does not mean that the reader will always no longer be able to produce bytes. As an example, on Linux, this method will call the recv syscall for a TcpStream, where returning zero indicates the connection was shut down correctly. While for File, it is possible to reach the end of file and get zero as result, but if more data is appended to the file, future calls to read will return more data.
  2. The buffer specified was 0 bytes in length.
4 Likes

A TCP stream is a potential infinity long stream of bytes. Within that stream one might be sending lots of discrete messages. It's necessary to have some protocol in place to enable detecting when each message begins and ends.

Simple schemes are:

  1. Send a message length before sending the actual data of the message, then read as many bytes as that length specifies.
  2. Send some special marker at the end of a message (Be sure that whatever that marker is it is never a valid symbol in the message or arrange some escape sequence of it in the message)
  3. Good old HTTP solves this by just closing the connection after sending an HTML page.

Just now I'm trying to interface to a system that sends messages in JSON format over TCP. To indicate the end of a message they send a form feed character (0x0C ). Simple but effective.

4 Likes

Hi and thank you for your answer. Yes, all this what you've said makes sense.