Reading length & payload from a TcpStream

Hi all,

I'm sending JSON data through a tcp socket along with JSON data length as the first 8 bytes, with network order using to_be_bytes() method.

Is there a faster way to achieve this:

// read size first
let mut size_buffer = [0; std::mem::size_of::<usize>()];
socket.read(&mut size_buffer).unwrap();
let json_size = usize::from_be_bytes(size_buffer);

// read JSON data
let mut json_buffer = vec![0; json_size];
socket.read(&mut json_buffer).unwrap();

Thanks for your help.

That should be good. Except the following points:

  1. 4 bytes (u32) should be more that enough to represent the payload's size. You are using usize, which depends on the platform - imagine what would happen if one end was compiled for a 32-bit architecture and the other was 64-bit.

  2. You have two calls to read. read doesn't guarantee reading till the end of the buffer. use read_exact instead (unless you want to check the bytes read, and call read again in a loop...).

  3. You should probably make an upper limit for json_size. Your program will crash (OOM) if the received json_size is too big.

  4. If network speed is the bottleneck - JSON tends to compress well.

1 Like

@naim Thanks for your tips, these make sense.

As for the item 2, I don't see how to just make one call to read, since the payload size is not known beforehand.

It's about calling read_exact instead of read on both instances.

Example:
You want to read the payload size (4 bytes), so you call read. But the OS only does a partial 1 byte read (read returns Ok(1)). You will need call read more times to get the full payload size. The same goes for the payload data - where this might actually occur.

From read's documentation:

@naim Thanks for clarifying :slight_smile:

1 Like

If you want to make fewer reads, use a BufReader.

In essence, you read a big chunk all at once (maybe 4k?). Then you check the length, and read out the rest that you need, but it's already in memory, so you aren't making more syscalls. If it was too much, the buffer holds the rest until you ask for it. If it wasn't enough, you get what's in the buffer then make another call to get the rest.

1 Like

Yes! There's no way you can guarantee one system read per item of unknown size, but with buffered input and items of generally modest size, you're very likely to get most items in one system read. With separate reads for size and content, that will be two system reads for most items.