Synchronisation in long running computations in tokio

I'd like to run a very expensive computational routine on global app state which performs some async IO on a multithreaded tokio runtime from within an axum handler.

Is the following a correct way of doing this?

type GlobalState = Arc<std::sync::RwLock<App>>;

pub async fn handler(State(app_state): State<GlobalState>) {
    let mut lock_guard = app_state.write().unwrap();

    tokio::task::block_in_place(move || Handle::current().block_on(lock_guard.long_running_future()));
}

The docs state:

Runs the provided blocking function on the current thread without blocking the executor.

In general, issuing a blocking call or performing a lot of compute in a future without yielding is problematic, as it may prevent the executor from driving other tasks forward. Calling this function informs the executor that the currently executing task is about to block the thread, so the executor is able to hand off any other tasks it has to a new worker thread before that happens.

Runs a future to completion on this Handle’s associated Runtime.

This runs the given future on the current thread, blocking until it is complete, and yielding its resolved result.

I have a tough time understanding the documentation here. My interpretation is that the block_in_place signals to the currently active runtime, that the worker thread is about to become uncooperative (long time to the next yield). Once I've notified the runtime I should be able to block the thread by running my very long computation.

Computation is async because it saves data to the DB and I assumed that explicitly blocking on these within the function to be unnecessary if I just block on the whole async function instead.

Essentially the future should run uninterrupted within the same thread is my understanding. All awaits within the routine will be blocking so I shouldn't have to worry about my lock not being async aware, as the future holding the lock will not be moved between other worker threads.

Besides my general lack of confidence that this approach is sound, there's the issue of cancellation. Docs state:

Code running behind block_in_place cannot be cancelled. When you shut down the executor, it will wait indefinitely for all blocking operations to finish. You can use shutdown_timeout to stop waiting for them after a certain timeout. Be aware that this will still not cancel the tasks — they are simply allowed to keep running after the method returns.

I wouldn't think that cancellation matters, since I'm not yielding to the executor until I'm done anyways. As to executor shutdown, that also is not something I should worry about as executor shutdown would occur only on application shutdown (?).

Having said all of the above the problem is that I know for a fact that this deadlocks (future never resolves, lock is never released, and application hangs) under certain circumstances - not too sure why, but it happens when I close the TCP connection while the computations are running.

I know that message passing is the preferred way of doing task synchronisation in tokio, but I'd like to understand what went wrong here to better understand the runtime's behaviour.

Your creating code smell (mixing async inside blocking) so I would hope anyone reviewing your code would reject it.
(Can't comment whether tokio supports such.)

Your exclusive lock is blocking (almost) any other request so, perhaps, you could do that long-running future inline?

I don't think you should be blocking other requests though. (For availability, it requires carefully writing try_read everywhere else and responding that the server is busy.)

This approach is indeed very crude and comes from a POC implementation and definitely needs refactoring. Since my original post I've determined that the deadlock does occur when new contender for the lock appears while the long running future awaits, this somehow prevents the future from progressing as if the executor or maybe io driver got suspended as well? Something like this must be the case since changing the lock to async aware one solves the problem.

1 Like

Since I'm currently developing a complex program with a similar requirement (web servers + shared I/O), I recommend you have a look at the actor model. I use it extensively and it immensely simplifies communication between different async stakeholders (aka. actors).

Calling block_on inside block_in_place doesn't make sense, you just await your long running future. If that can become uncooperative then it either shouldn't be a future or it might want to use block_in_place internally.

1 Like

That's exactly what I've been doing these last few days. Do you use any framework for writing the actors by any chance?

Can you explain why it doesn't make sense given the documentation I've cited? I signal the runtime that current thread is about to block, so I should be able to block as and when I see fit. Essentially I'd like a way to stop having to worrying about async runtimes and just run sequentially.

No, I've actually written my own traits that implement the actor pattern, but I'd not call it a framework. It's more of an API that abstracts across smart home protocols, such as Zigbee and Matter/Thread, so that you can implement the necessary traits for your NCP driver or whatever and have an appropriate actor provided to you.
That being said, I found tokio_actors, but it's not widely used judging from only a few hundred downloads, and I also never used it.

1 Like

Great, thanks for feedback!

From my understanding, the block_in_place function is meant to give the runtime a task that is not async (or non-cooperative) for it to schedule outside of the async task threads.
By spawning a future inside of that "blocking thread" task, you are involving that "separate" task in the main runtime in a way that tokio doesn't expect.
This is supposed to be a standalone "deal with a driver / I/O that is not async aware", but by using the rw lock it seems you're forcing the main runtime to wait on what should be a separate/isolated blocking task.

2 Likes

It doesn't make sense because inside the block_in_place you immediately re-enter the cooperative runtime scheduling by using Handle::block_on. The lock_guard.long_running_future() is executed like other futures and is expected to behave in a cooperative way. If it doesn't then your code won't fix it. If it does then you could just directly .await it.

2 Likes

This makes sense, the problem is that I've assumed that Handle::current will affect the calling thread but as you've pointed out it refers to the current thread's runtime instead. This in turn can block the runtime on the standard lock.

I'm not really well adjusted to the global scope of things in async rust I suppose. I wish I could pass down runtime handle explicitly although I'm not sure that even make sense in the Rust's async model.

None of this is “Rust”. It’s Tokio that decided that accessing the Tokio Runtime through thread-local state was a good API design, for better or for worse. They could have omitted it and, as you say, passed the handle explicitly.

It's not a good idea to write block_in_place(|| block_on(foo)) when you could have written foo.await.

1 Like