How to handle errors when accepting connections?

TcpListener::accept returns a Result:

Reading the man page for accept, it seems there might be two fundamentally different cases for error here:

https://man7.org/linux/man-pages/man2/accept.2.html

Linux accept () (and accept4 ()) passes already-pending network errors on the new socket as an error code from accept (). This behavior differs from other BSD socket implementations. For reliable operation the application should detect the network errors defined for the protocol after accept () and treat them like EAGAIN by retrying. In the case of TCP/IP, these are ENETDOWN , EPROTO , ENOPROTOOPT , EHOSTDOWN , ENONET , EHOSTUNREACH , EOPNOTSUPP , and ENETUNREACH .

That is, some errors mean that accept itself failed to succeed in some way, and other errors mean some error related to a specific connection.

It seems to indicate that the following is the wrong way to use accept:

for stream in listener.incoming() {
    let stream = stream?;
    handle_connection(stream);
}

Here, if one connection faults early, this brings down the whole server, preventing unrelated clients from connecting.

At the same time, code like the following also seems erroneous:

for stream in listener.incoming() {
    match stream {
        Ok(stream) => handle_connection(stream),
        Err(err) => log::error!("failed to accept: {}", err)
    }
}

If the error is actually a problem with accept itself (ENFILE, EPERM), than this will enter a busy loop, where each call fails with the same error.

So, far, it seems the correct way to handle this is something like the following:

for stream in listener.incoming() {
    match stream {
        Ok(stream) => handle_connection(stream),
        Err(err) if is_fatal(&err) => return Err(err),
        Err(err) => log::error!("failed to accept: {}", err)
    }
}

However, writing the is_fatal function seems far from trivial -- it looks like it necessary needs to be system dependent, and probably requires arcane knowledge of of various imple details. How do folks solve this problem in practice in Rust? What would be the right pieces of software to look at production solutions?

6 Likes

I recall someone pointing out to me that you should always use a semaphore to limit the number of concurrent connections by the file descriptor limit so you don't run into a busy error loop by exceeding it.

2 Likes

In practice how often such a busy loop happens? Wouldn't putting some short sleep after error solve the problem of hogging the CPU? And at least in industrial settings some external monitor would be responsible for killing unresponsive service one way or another, though I guess one could built-in one as a separate thread as well.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.