How can a loop create deadlock on some machines and not locally

Hello,
I am new to Rust. Sorry for stupid question.
Ive been trying to run a multithreaded code, where the threads wait for some queue having some tasks, and executing that task if there are any.
I have a complex version of the following code:

use std::thread;
pub async fn do_something() {
    println!("Started!");
    loop { // <- If i comment this out, I see 4 Started
        thread::sleep(std::time::Duration::from_secs(1));
    }
}
#[tokio::main]
async fn main() {
    let mut workers = vec![];
    for _ in 0..4 {
        println!("Go!");
        workers.push(tokio::task::spawn(do_something()));
    }
    thread::sleep(std::time::Duration::from_secs(2));
    println!("Finished!");
}

Locally, it worked for every sort of PC. But it fails on places such as Azure pipeline.
I realized that instead of 4 "Started" I see 2 "Started". The playground then gives time-out while devops servers wait first for like 1 hour.
There exists also no problem if my number of threads is 1; therefore this issue happens just with >1 number of threads.

I am sort of lost looking at a really short code. Can anyone maybe see anything?

Check out this blog post:

tl;dr don't use thread::sleep

4 Likes

The answer to the literal question is that tokio will start a number of real threads based on the number of CPU cores, so you're probably seeing it run on a one or two core machine on Azure, with two allocated threads.

1 Like

I do find it interesting that it doesn't hang with one thread though. That implies that tokio is coopting the main thread to run tasks after returning from main, killing the worker threads silently after some time, but gets starved if there's number of worker threads+1 blocking tasks. Tricky!

I also discovered this probably after 4-5 hours writing this. hope this will be useful.

It's really tricky that for the task, it hijacks the main thread!