Multithreaded server quits serving after 16374 requests

Hi!

I'm following the tutorial for the multi-threaded server in the rust book. When I run apache-bench against the server, it always hangs after 16374 requests are completed.

ab -n25000  127.0.0.1:7878/

Does anyone know why this would be? I know that the server isn't meant to be a production server, but I cant figure out what's causing it to hang (no panic).

I'll include code here in case I've followed the steps incorrectly:
// lib.rs
use std::thread;
use std::sync::{mpsc, Arc, Mutex};

type Job = Box<dyn FnOnce() + Send + 'static>;

pub struct Worker {
    id: usize,
    handle: thread::JoinHandle<()>,
}

impl Worker {
    fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
        let handle = thread::spawn(move || loop {
            let job = receiver.lock().unwrap().recv().unwrap();

            println!("Job starting in worker {}", id);

            job();
        });

        Worker { id, handle }
    }
}

pub struct ThreadPool {
    threads: Vec::<Worker>,
    sender: mpsc::Sender<Job>,
}

impl ThreadPool {
    pub fn new(size: usize) -> ThreadPool {
        assert!(size > 0);
        let mut threads = Vec::<Worker>::with_capacity(size);

        let (tx, rx) = mpsc::channel::<Job>();
        let receiver = Arc::new(Mutex::new(rx));

        for id in 0..size {
            threads.push(Worker::new(id, Arc::clone(&receiver)));
        }

        ThreadPool { threads, sender: tx }
    }

    pub fn execute<F>(self: &Self, job: F)
    where 
        F: FnOnce() + Send + 'static,
    {
        println!("ThreadPool::execute -- sending a job");
        self.sender.send(Box::new(job)).unwrap();
    }
}
// main.rs
use std::io::prelude::*;
use std::net::{TcpListener, TcpStream};
use std::time::Duration;
use std::thread;
use threaded_server::ThreadPool;

fn main() {
    let listener = TcpListener::bind("0.0.0.0:7878").unwrap();
    let pool = ThreadPool::new(4);

    for conn in listener.incoming() {
        pool.execute(Box::new(move || {
            handle_connection(conn.unwrap())
        }));
        // handle_connection_2(conn.unwrap());
    }
}

fn handle_connection(mut stream: TcpStream) {
    let mut buffer = [0; 2048];
    stream.read(&mut buffer).unwrap();

    let get = b"GET / HTTP/1.0\r\n";
    let sleep = b"GET /sleep HTTP/1.0\r\n";

    let (status_line, filename) = if buffer.starts_with(get) {
        ("HTTP/1.1 200 OK\r\n\r\n", "hello.html")
    } else if buffer.starts_with(sleep) {
        thread::sleep(Duration::from_secs(5));
        ("HTTP/1.1 200 OK\r\n\r\n", "hello.html")
    } else {
        ("HTTP/1.1 404 NOT FOUND\r\n\r\n", "404.html")
    };

    let contents = std::fs::read_to_string(filename).unwrap();
    stream.write(&format!("{} {}", status_line, contents).as_bytes()).unwrap();
}

Your system might have exhausted all file descriptors. Maybe check around if sockets are still opened (in some TIME_WAIT or TIME_CLOSE state, and what the fd limit on your system is.

Btw: There are also issues in the code. E.g. read and write are not guaranteed to read/write all bytes - only some of them. In order to be correct you need to use write_all and read until you are sure you have parsed a proper request.

It's unlikely to be an issue on the reading side, since the data is small. But it might be on the writing side. You also don't set a content-length header, which might lead the client to hang around and wait for more data which will get the socket into an abnormal closure / WAIT state.

1 Like

Thanks for the reply! I ran netstat while running ab and many sockets are in TIME_WAIT state. I increased the descriptor limit using osx's kern.maxfiles property, but that still didn't make it past 16374. I'll send a content length header. and see if ab is waiting for data.

Still no luck with those. For comparison I ran ab against actix-web serving the same document. The same issue happens with actix so it seems that it must be the file descriptors.

I would try changing it to use stream.read_to_end. Maybe theres still data left to read and the os isn’t closing the connection. Not sure if that’s actually possible but may be worth a shot.

Another thing to try is to manually shutdown the TcpStream with the shutdown(...) after writing. This should happen automatically when the value is dropped so I doubt it would make a difference.

A final thought is if you run apache-bench with 15000. Let it finish and then run it again with the same setting does it hang at 1374 requests?

Thanks, I tried your stream.read_to_end suggestion, but got the same limit. I also tried your second suggestion, and it does shut down after 1374 but only if run quickly. If I wait ~10 seconds, then it will run 15000 again.

I think I've found that its a osx specific behavior around socket port numbers. This answer on Stack Exchange seems to be the same behavior.

It's not particularly osx specific. In TCP a socket has a specified wait time after it has been closed, in which it essentially ensures that the other end of the connection received the close message itself. What OS'es might differ on is how exactly the timeouts after connection close is implemented. It seems that osx keeps the port fully reserved by default.

Theoretically the port would need to be appear closed only for that particular connection (the tuple of your address, port and remote address, port). However, reuse of the local port is not quite trivial. Since you can't reasonably open the same connection during that wait time, if the OS allowed arbitrary reuse your program might be affected by state for which abother programs, that held the same port previously, is responsible. This is generally regarded as bad or unsafe design. Linux specifically allows SO_REUSEPORT which permits bindings regardless but only for sockets controlled by programms under the same user and each program needs to specify the flag to signals their awareness of this influence. This should be available on BSD and OSX as well, you'll likely want to consult manuals, man pages, and other developer resources and use the raw, unsafe os interfaces if this limit is critical to your operation.

This was the short version from the top of my head. Searching Stackoverflow brought up this incredibly detailed and well-written answer on SO_REUSEPORT which goes into more details.

1 Like

If you send the Content-Length header, you can simply handle multiple requests on the same connection. This would probably reduce the number of connections/socket to a reasonable amount.

It's really easy to do, just wrap the content of handle_connection in a loop and break if the return value of read is 0.