Tokio TcpListener - detecting network issues

Hi folks,

With tokio TcpListener, while awaiting a read operation, what is the proper way to detect network issues (say, cable unplugged, network card unplugged etc)?
I was surprised to find out read() doesn't return an error, but instead continues awaiting after I pull the cable.
I could only find this thread, but it deals with issues on the client side.
https://users.rust-lang.org/t/tokio-detect-connection-loss/116217/3

There is no difference between client and server, here. Once established, a TCP connection is completely symmetric. The answer is the same in both cases — the only way to know the connection is working is to try to communicate through it.

For a server, this often means that the server has a timeout for inactive connections, and the client has the obligation to either get its business done quickly, or send some kind of ping message.

1 Like

Strange though the following example doesn't detect the disconnected network cable (and it does both read and write). So the only reliable way shall be sending a heartbeat and receiving a reply.
For some reason the keepalive doesn't work too (the same idea of having a heartbeat, but invisible to the application), perhaps because it's Friday evening and I am missing something.

https://stackoverflow.com/questions/74341915/how-to-set-tcp-keepalive-in-tokio
https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html

use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::{TcpListener, TcpSocket, TcpStream};
use tokio::time::sleep;

use tokio::time::Duration;

#[tokio::main]
async fn main() {
    println!("hello");

    // for those kinds of tests binding to 0.0.0.0 might lead to misleading results I think.
    // Testing in a VirtualBox with bridged network adapter. Breaking the connection by unchecking "Virtual Cable connected"
    // let addr: std::net::SocketAddr = "0.0.0.0:4567".parse().unwrap();  
    let addr: std::net::SocketAddr = "192.168.1.2:4567".parse().unwrap();
    let socket = TcpSocket::new_v4().unwrap();

    socket.set_reuseaddr(true).unwrap();
    // socket.bind_device(interface)
    // println!("keepalive is: {:?}", socket.keepalive());
    socket.set_keepalive(true).unwrap();


    socket.bind(addr).unwrap();
    let listener = socket.listen(1).unwrap();
    println!("bind to {:?} successful", addr);
    tokio::select! {
        _ = worker(listener) => {},
        // _ = alive_check() => {},

    };
}

async fn worker(listener: TcpListener) {
    loop {
        let accepted_listener = match listener.accept().await {
            Ok(x) => x,
            Err(e) => {
                println!("listener accept error: {e}");
                continue;
            }
        };
        let stream = accepted_listener.0;

        //=== https://stackoverflow.com/questions/74341915/how-to-set-tcp-keepalive-in-tokio
        let ka = socket2::TcpKeepalive::new().with_time(std::time::Duration::from_secs(5));
        let sf = socket2::SockRef::from(&stream);
        sf.set_tcp_keepalive(&ka).unwrap();

        let (mut stream_r, stream_w) = stream.into_split();

        tokio::spawn(async move {
            // let mut stream_w = stream_w;
            alive_check(stream_w).await
        });
        println!("listener accepted");
        loop {
            print!(">");
            let char = stream_r.read_u8().await.unwrap();
            let char = char as char;
            print!("{char}");
            print!("< ");
        }
    }
}

async fn alive_check(stream_w: tokio::net::tcp::OwnedWriteHalf) {
    let mut stream_w = stream_w;
    loop {
        println!("alive_check");
        tokio::time::sleep(Duration::from_secs(1)).await;
        stream_w.write_u8('a' as u8).await.unwrap();
    }
}

Have you tried plugging the network cable back in? The reason it works this way was because they wanted a tcp stream to be able to survive temporarily losing the connection.

Yes, plugging the cable back results in receiving all the messages waiting.

When you try to send data, and the data can't actually be sent down the cable (or more precisely, when the other end doesn’t respond with an ACK), a timer starts (inside the network stack, not your process). When that timer runs out, the socket will be closed with an error, if the cable isn’t plugged back in before then.

Can I set the value of the timer in my process, or there is one single global timeout value?