Simulating network delay - Tokio TcpStream recv takes longer than expected

(Not a hundred percent sure if this is a misunderstanding of tokio::net::TcpStream, Tcp in general or the tc utility)

I'm implementing a two-party protocol using tokio::net::TcpStream and tokio_serde::SymmetricallyFramed. These are used to build an Transport struct that implements SinkExt and StreamExt for some serde item. I've set nodelay() on the TcpStream.

On a high level the protocol looks like this:

let mut response = initial_value;
loop {
    let msg = compute_message(response);   // fast and messages are small, usually less than 100 bytes
    transport.send(msg).await;   
    let response = transport.next().await;
    
    if done_condition {
        break;
    }
}

where transport is the struct which can send and receive msg structs by serializing/deserializing them with bincode and sending them via a TcpStream.

Now I want to benchmark this protocol using the linux traffic control (tc) utility. I have to hosts and on each execute the following:

#!/bin/sh
/sbin/tc qdisc del dev enp23s0 root
/sbin/tc qdisc add dev enp23s0 root handle 1:0 htb default 10
/sbin/tc class add dev enp23s0 parent 1:0 classid 1:10 htb rate 100Mbit
/sbin/tc qdisc add dev enp23s0 parent 1:10 handle 10:0 netem delay 50ms 3ms 25% distribution normal

which adds a delay of 50ms to the sending for both parties.
As expected, the ping output shows a round trip time of approx. 100 ms:

> ping host_b
64 bytes from <ommited>: icmp_seq=1 ttl=64 time=98.9 ms
64 bytes from <ommited>: icmp_seq=2 ttl=64 time=101 ms
64 bytes from <ommited>: icmp_seq=3 ttl=64 time=96.1 ms
64 bytes from <ommited>: icmp_seq=4 ttl=64 time=98.6 ms
64 bytes from <ommited>: icmp_seq=5 ttl=64 time=94.8 ms

So now to the confusing part:

I would expect my protocol to finish in #loop iterations * RTT / 2 time. Both parties should be able to send their messages at the same time and receive approx. 50 ms later the message of the other party. So each loop iteration should take RTT / 2 time.
However, this is not what I'm observing. When measuring the time of the send and next operations, send completes almost immediately, while next takes around 100 ms. So it seems like each loop operation takes the time of a roundtrip.

I guess this is not an issue with my usage of tokio's TcpStream, but rather my understanding of Tcp or my usage of tc but I can't figure it out :grimacing:
I hope someone can help me :sweat_smile:

Edit:
Looking at the logs, I noticed something weird:

Logs - Party 0
{"timestamp":"2022-03-24T19:16:05.811818Z","fields":{"message":"Layer send","layer_time":"0"}}
{"timestamp":"2022-03-24T19:16:05.811890Z","fields":{"message":"Layer recv","layer_time":"0"}}
{"timestamp":"2022-03-24T19:16:05.811967Z","fields":{"message":"Layer send","layer_time":"0"}}
{"timestamp":"2022-03-24T19:16:05.910809Z","fields":{"message":"Layer recv","layer_time":"98"}}
{"timestamp":"2022-03-24T19:16:05.910968Z","fields":{"message":"Layer send","layer_time":"0"}}
{"timestamp":"2022-03-24T19:16:06.008920Z","fields":{"message":"Layer recv","layer_time":"97"}}

it continues in the 0 and ~100 pattern.

Logs - Party 1
{"timestamp":"2022-03-24T19:16:05.760583Z","fields":{"message":"Layer send","layer_time":"0"}}
{"timestamp":"2022-03-24T19:16:05.858636Z","fields":{"message":"Layer recv","layer_time":"98"}}
{"timestamp":"2022-03-24T19:16:05.858804Z","fields":{"message":"Layer send","layer_time":"0"}}
{"timestamp":"2022-03-24T19:16:05.952837Z","fields":{"message":"Layer recv","layer_time":"94"}}
{"timestamp":"2022-03-24T19:16:05.952967Z","fields":{"message":"Layer send","layer_time":"0"}}
{"timestamp":"2022-03-24T19:16:06.052081Z","fields":{"message":"Layer recv","layer_time":"99"}}

it continues in the 0 and ~100 pattern.

It seems like P1 is sending the first message significantly before P0. In the beginning, P0 sends their message, immediately receives the message by P1 and directly sends the next message of the second iteration of the loop. What confuses me, is that P1 receives the second message nearly 100 ms after the first, although they were sent at virtually the same time!
This would maybe make sense if there was message reordering involved, but from what I can tell, that should not be the case here.

I've simplified the code to the following:

use anyhow::Result;
use clap::Parser;
use std::net::SocketAddr;
use std::time::Instant;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::{TcpListener, TcpStream};

#[derive(Parser, Debug)]
struct Args {
    /// Id of this party
    #[clap(long)]
    id: usize,

    #[clap(long)]
    server: SocketAddr,
}

#[tokio::main]
async fn main() -> Result<()> {
    let args = Args::parse();
    let mut socket = if args.id == 0 {
        let listener = TcpListener::bind(args.server).await?;
        let (socket, _) = listener.accept().await?;
        // send data ASAP
        socket.nodelay().unwrap();
        socket
    } else {
        let socket = TcpStream::connect(args.server).await?;
        // send data ASAP
        socket.nodelay().unwrap();
        socket
    };
    let small_buf = [0; 100];
    for _i in 0..100 {
        let mut read_buf = [0; 100];
        let now = Instant::now();
        socket.write_all(&small_buf).await.unwrap();
        println!("Write time: {}", now.elapsed().as_millis());
        let now = Instant::now();
        socket.read_exact(&mut read_buf).await.unwrap();
        println!("Read time: {}", now.elapsed().as_millis());
    }
    Ok(())
}

Playground

Running this in my testing environment as described, results in the same behavior.

I'd also be happy about any ideas on how to debug this issue. Using netstat I was able to confirm that no retransmissions are taking place.

Try moving your code into a tokio::spawn, or change it to use the current-thread runtime.

Thanks for your answer! Unfortunately, using #[tokio::main(flavor = "current_thread")] didn't change anything.
I've also tried to split the socket and use try_join! on the futures for read and write, but again, no change.

Using tokio::spawn was a little trickier, but i got it working like so:

    let (read_half, write_half) = socket.into_split();
    let read_half = Arc::new(Mutex::new(read_half));
    let write_half = Arc::new(Mutex::new(write_half));
    for _i in 0..100 {
        let now = Instant::now();
        let (tx, rx) = oneshot::channel();
        let wh = Arc::clone(&write_half);
        tokio::spawn(async move {
            let small_buf = [0; 100];
            let mut writer = wh.lock().await;
            writer.write_all(&small_buf).await.unwrap();
        });
        let rh = Arc::clone(&read_half);
        tokio::spawn(async move {
            let mut read_buf = [0; 100];
            let mut reader = rh.lock().await;
            reader.read_exact(&mut read_buf).await.unwrap();
            tx.send(read_buf).unwrap();
        });
        let received = rx.await.unwrap();
        println!("Iter time: {}", now.elapsed().as_millis());
    }

However, each loop iteration still takes 100 ms (the RTT time) instead of 50 ms. I somehow feel like I'm missing something simple and this is the expected behavior. But from my understanding, each loop iteration should effectively be half a round trip...

it's probably fine here, but be careful using read_exact with sockets. it frequently leads to data getting stuck in an inaccessible buffer when you don't have enough data to quite fill it.

Reading docs is hard... :man_facepalming: After some discussions with my flatmate, we came to the conclusion that the issue is likely related some weird Tcp Behaviour. I actually showed him the docs of set_nodelay to show that I tried to work around some of the issues. Unfortunately it didn't solve the problem at that time...

After messing with QUIC and the socket2 crate, I noticed that I'm not actually calling set_nodelay but nodelay, which just returns the current value.
After that fix, the behavior is as intended and I have the appropriate runtimes :sweat_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.