Understanding memory usage in the context of socket I/O

I want to understand how to optimize memory usage when exchanging large files between a client and a server using standard sockets.

Could someone please explain where the bytes are stored after writing them to std::net::TcpStream?

I'm using chunks, but not sure where my bytes are living immediately after writing them, I mean when the client hasn't read them yet.

Should I wait for some acknowledgment (read) signal before sending a new chunk in the queue?

  • They are put in a fixed-size buffer managed by the kernel or its drivers.
  • If the buffer has no remaining space, write() will block until it does.
  • Bytes already in that buffer are dropped, making more space available, once the remote side has acknowledged receiving them.

Basically, you do not need to do anything extra for flow control, because TCP handles it for you.

1 Like

Thanks for reply, but yet not sure I understand.
It's system management, yes, with native-tls wrapper on Linux/Fedora

TlsStream<TcpStream>

For example,

  1. I have file with 1Gb size of bytes on disk
  2. I read some chunk, e.g. 100 Mb, socket driver probably borrow these ~100mb in memory
  3. then I writhe them into output stream and chunked data outs if its iteration scope. I think the 100Mb should free ~100Mb of memory at this point (by Rust) as some buffer object outs of scopes or goes somewhere into the writer function (system socket backend).
  4. But recipient yet not called read. And I don't know who is holding the bytes at this moment, and where exactly (on server or client pool) - I think bytes yet on server, in it's memory, so not sure when exactly run the next chunk writer to not take all those 1Gb of bytes from memory immediately.

At this point the bytes are in two places:

  • in your machine's kernel's socket buffer,
  • either moving across the network, or in the other machine's socket buffer waiting for the read() call from the process.

Then when the other machine sends acknowledgement, they are dropped from your kernel's buffer, freeing up space for writing more data.

I think you are asking when you should start writing the next chunk. The answer is: you should be calling write_all(), which will block until all of the data is (processed by TLS and) written into the kernel buffer. So, waiting is implicit and automatic; all you need to do is have the loop. (Or let std::io::copy loop for you.)

1 Like

but every method require sized buffer,
or I can just use &[u8] as the pointer to the file bytes?

You do have to pick a size to read. I assumed you had already done that when you spoke of a 100 MB chunk. 100 MB is probably unnecessarily large; 1 MB would be more reasonable. The ideal size depends on your kernel but it doesn’t matter a whole lot.

1 Like

I've created this function that uploading data with chunks using File::read_at offset:

match storage::Item::from_url(gemini.url.as_str()) {
    Ok(item) => {
        let mut read: usize = 0;
        match stream.write_all(
            // some headers data
            .into_bytes(),
        ) {
            Ok(()) => loop {
                let mut data = vec![0; argument.chunk];
                let l = file.read_at(&mut data, read as u64).unwrap();
                // EOF
                if l == 0 {
                    stream.flush().unwrap();
                    stream.shutdown().unwrap();
                    break;
                }
                stream.write_all(&data[..l]).unwrap();
                read += l;
            }
            // ..

I maybe want run the tests to understand what is going on with memory in the backend.
Suppose, this implementation just put the all bytes in loop same as I'm doing that without.

                let mut data = vec![0; argument.chunk];

This is allocating a new Vec and then dropping it (freeing the memory in the heap) at the end of the block, for every iteration of the loop. This declaration should be moved above the loop, so it is only allocated and dropped once.

A single buffer (the Vec) can be reused for multiple read/write cycles because the write operation will make a copy of the bytes. Note that you're passing a slice to write_all, you're not transferring ownership by passing the Vec as a value.

1 Like

I noticed that you are interested in sending very large files through TCP. I've been looking into this topic lately and learned quite a few things that I would like to share with you, which are the following:

  • TCP window size tuning: During the TCP handshake, both sides agree on a window size, which is the number of bytes you can send/receive before needing an ACK (in C, this can be done by setting the socket options SO_SNDBUF and SO_RCVBUF via setsockopt).
  • Implementing a throttling mechanism: Adjust the sending rate to avoid overloading the server (especially if you have multiple clients).
  • Chunking and Compression: Chunk your file into blocks because you obviously can't handle loading 1 GB into memory. Chunking it into blocks also makes it easier to compress.
  • Integrity and Failure: Hashing your blocks will improve safety and performance. Safety is enhanced by comparing block hashes (on the client side) with the received blocks, and with hashing you can avoid MITM attacks(using weak hash algorithms can expose you to Hash Collision attacks too)
    Note: Transferring large data over SSL/TLS can introduce additional overhead due to encryption and decryption processes, potentially impacting performance.
1 Like

Only if your CPU is produced before 2010 or is a tiny microprocessor including those within microwave or radiator. With modern CPUs most cryptography workloads are offloaded to dedicated hardware implementation and have minimal impact on performance.