I'm a beginner in Rust and trying to understand how Rust runtime works and how Rust deals with tasks and threads. While doing so, I noticed that Rust app can get stuck or hard to terminate. It may be tricky to debug. I have a simple example below.
This example cannot terminate after doing ctrl + c, even though the ctrl + c is received and "Received Ctrl+C, exiting" is printed on screen.
I later learned that it's because the task (infinite loop) doesn't yield. My guess is that the worker thread keeps spending cpu on running the loop which prevents the tokio runtime from killing it because it isn't being coorperative and "give up".
After adding yield_now().await, the app can terminate after ctrl + c.
This is very different from go which will terminate the go routine even if it runs forever.
func main() {
c := make(chan bool, 0)
go func() {
i := 0
for {
fmt.Println("starve", i)
i += 1
}
}()
c <- true
}
The difference is very fasinating. How can go achieves this and it seems like Rust doesn't take the same approach. Since Rust doesn't take this approach, what's the rationale behind it and how to handle it the Rust way?
I want to understand what's the proper way to avoid some dead loop inside Rust apps. In order words, how can I make a task cancellable? also what's the best practice to prevent such task to run forever. Imagine a developer accidentially adding (loop {}) inside some deep function. It will starve and lockup the application.
What's the proper way to avoid this? Is it by adding yield_now() inside some lopp and what's the overhead of yielding? In go, there's WaitGroup and Context that let developers to properly/gracefully exit from the go routines. And even without wait-group or context, Go's scheduler can preemptively kill go routines and exit in the above example. I'm new to Rust and wanted to understand more about this area in Rust. Is Rust purely coorperative? and developers must yield, or there are idiomatic Rust ways to avoid these task to run forever. Thanks
Tokio - in fact most async executors - will, by intentional design, run until all tasks conclude or abort. If you submit a task that will never end, then the executor will never shut down on its own.
However, if you have a reference to the Tokio runtime, you can shut it down even while tasks are still running. Whether this is a good idea or not depends on your needs - shutting it down will cause running tasks to be cancelled, which will in turn cause any work they were doing to be dropped on the floor.
While your program should be prepared for this anyways (power failures and the oomkiller exist), in practice it's usually worth designing your program to shut down gracefully unless you are prepared to make sure every task can be either aborted or recovered even if interrupted in the middle.
In your specific case, even shutdown won't completely terminate the stuck job, since it never reaches an await. Async tasks can't be terminated before an await point any more than a thread can be, but unlike threads they can be terminated once they reach one. The only way to definitively terminate a job that will never reach an await point on its own is by exiting the whole process. Returning from main (the underlying one, in the case of #[tokio::main]) will terminate the process as well. If you want to avoid infinite loops, the only way to do that is to avoid writing them, unfortunately.
Hi @derspiny , thanks for the reply.
I don't mean simple infinite loop per se. It could be a long polling non-async function that reads from a message queue like Kafka.
For example:
(ie. kafka-rust/examples/example-consume.rs at 69c29ec16c9e2292e6bab6680fbdc722e612d30b · kafka-rust/kafka-rust · GitHub) this function doesn't seem to yield, so it could run forever if spawned into a task...
Or a function that may suddenly went rogue and doesn't return. In go, I don't seem to need to worry about it because terminates main means terminate all go routines.
This is not correct; the Tokio runtime's shutdown process includes aborting tasks, never waiting for them to complete unless they do so by coincidence:
Tasks spawned through Runtime::spawn keep running until they yield. Then they are dropped. They are not guaranteed to run to completion, but might do so if they do not yield until completion.
…
The thread initiating the shutdown blocks until all spawned work has been stopped.
So, Tokio shutdown will abort (drop) tasks before they complete, but it will also block until yield points are reached in all tasks. The problem here is that the task async move { loop {} }never yields. This not only prevents Tokio shutdown in particular, it is incorrect async code — async code should always yield in finite time. A yield_now().await inside the loop suffices, if there isn't something else it awaits.
Blocking non-async functions should not be called from within async tasks, because that reduces the available capacity in the executor thread pool and that might eventually cause a deadlock; you should use spawn_blocking() for those, instead.
That won't help with shutdown, but Tokio also has a solution for that:
The thread initiating the shutdown blocks until all spawned work has been stopped. This can take an indefinite amount of time. The Drop implementation waits forever for this.
The shutdown_background and shutdown_timeout methods can be used if waiting forever is undesired. When the timeout is reached, spawned work that did not stop in time and threads running it are leaked. The work continues to run until one of the stopping conditions is fulfilled, but the thread initiating the shutdown is unblocked.
So, for your application, shutdown_timeout() might be appropriate to take a moment to let the async work (probably) finish, and then terminating the process and the Kafka reader along with it. This will also terminate any other tasks with runaway code.