[RustBook] 20.3 Multithreaded WebServer sometimes goes into deadlock when shutting down

I follow the codes from 20.3 Graceful shutdown and cleanup and the project compiles and runs but when it comes to shutting down, it usually goes into deadlock and occasionally clean up successfully.

I printed out some calling order information:

Sending a job
Worker 0 just got a job!
Sending a job
Shutting down.
Trying to shut down all workers
terminate sent
terminate sent
Worker 1 just got a job!
terminate sent
terminate sent
worker 0 is shutting down
Worker 2 is told to terminate!
Worker 3 is told to terminate!
Worker 0 is told to terminate!
worker 1 is shutting down

It seems one of the terminate message is lost somehow and I can’t figure out why and how this is a occasional bug.
I’ve separated those two iteration just as the tutorial does:

for _ in &mut self.workers {
      println!("terminate sent");
      self.sender.send(Message::Terminate).unwrap();
}

for worker in &mut self.workers {
     println!("worker {} is shutting down", worker.id);

     if let Some(thread) = worker.thread.take() {
           thread.join().unwrap();
     }
}
1 Like

Shouldn’t that send message to each worker individually? If you’re just blindly sending messages, I suspect that they are not evenly distributed between workers, so some workers get terminate twice or more, and some workers don’t get any.

The book uses a single channel, whose receiver end is shared between worker threads via Arc<Mutex<Receiver<T>>>. So there is no way to distinguish between workers. It is supposed to work since each worker doesn’t receive any more after getting a Terminate message.

@chbdetta from just looking at the book’s code, it should be working fine. Can you put your whole code e.g. into a gist, to see if it’s really the exact same code?

I’m getting the exact same issue and I’m copying the source provided at the end of The Book. When using cURL the web server behaves as expected but when using Chrome, two TCP connections seem to be sent simultaneously. This consistently causes the deadlock.

@migueloller Could you perhaps replace the channel with Arc<Mutex<VecDeque<Message>>> and see if the deadlock still occurs?

@stjepang That would need a Condvar to make functional.

I suspect that maybe the code is blocked in the job rather than terminate. Printing when it completes would be possible first step.

job.call_box();
println!("Worker {} job complete.", id);

@jonh, That seems to be what’s happening. Here’s what’s being logged:

Worker 0 got a job; executing.
Worker 0 job complete.
Shutting down.
Sending terminate message to all workers.
Shutting down all workers.
Shutting down worker 0
Worker 2 got a job; executing.
Worker 3 was told to terminate.
Worker 1 was told to terminate.
Worker 0 was told to terminate.
Shutting down worker 1
Shutting down worker 2

@stjepang, I’m quite new to Rust so I wasn’t sure how to achieve this without using a channel, but I was successful in getting the program to compile without channels and using the deque. Unfortunately, I was getting poisoned mutexes so I wasn’t able to get the web server to run.

I’ll happily keep trying other ways if you can help :slight_smile: