The delicate dance of the sync->async bridge

I strongly suggest that you do not use the spawn/channel method you shared. That method is a sort of "manual block_on implementation", which can easily lead to hard-to-debug deadlocks if used in scenarios where Tokio's real block_on would panic, and provides no advantage over Tokio's block_on when Tokio's block_on doesn't panic.

Calling block_in_place causes Tokio to create a new runtime thread to replace the one you destroyed by calling block_in_place to avoid this scenario. Still, it can't avoid it entirely because there's a limit to how many times you can do it before it waits for previous block_in_place calls to exit, so yes, the deadlock is possible.

1 Like

If A is async and B is blocking, then you have already erred by calling B from A without spawn_blocking() or block_in_place(). The Tokio panic you get from layer D is merely Tokio detecting the problem (blocking in an async context), not where the problem is actually occurring. You have a bug in your A→B relationship which you must fix, separately from what D is or is not doing.

Something was lost in translation here because there are no panics (and no observed deadlocks either - it's a theoretical concern, though).

Thanks. That was a detail I'd missed.

I've come up with a plan to handle this without seriously refactoring the bulk of the app. So we have these layers:

A->B->C->D

where A & D are async. A is the gRPC entrypoint into the server. D is the layer making async calls to the 3rd-party API - and it turns out that the volume of those calls is linearly proportional to the number of active requests in A.

Here's what I'm going to do:

  • Use a semaphore at Layer A to throttle incoming requests. The standard lib doesn't have a semaphore anymore, but Tokio does, which I've used before, which is perfect for this.
  • Increase the Tokio Runtime thread max to well above the default (one per core).

These two things in concert should make deadlocks in practice impossible.

(Note: This system doesn't need to handle very high numbers of concurrent requests. It doesn't really need Tokio's thread:task multiplexing. Tokio is being used only because Tonic, the only real Rust gRPC library, is all async.)

Ah, sorry, I misremembered the situation. The first part of my previous reply still stands: If A is async and B is blocking — or more precisely, if B or C blocks in any way that is not “calling D”, then you must use spawn_blocking() or block_in_place() when calling B from A, not later. It isn’t sufficient to only put the block_in_place() around the async parts of D; that's just the part Tokio checks for you.

1 Like

If I may ask, why is adding a spawn_blocking call in the Tonic layer such a big refactor?

1 Like

Ok - it's not.

I really wanted to keep this contained in Layer D.

But then I'm already proposing a change to A (the semaphore).

You're right: This is probably better.

Until today I missed the critical point that running within a spawn_blocking context makes block_on safe.

3 Likes

Last word on this. I won't trouble the forum with more details, but spawn_blocking is not a drop-in solution, again because of the type bounds:

        F: FnOnce() -> R + Send + 'static,
        R: Send + 'static,

This makes it a bit cumbersome to call.

Wrapping 'self' references in Arc is a tolerable solution.

1 Like

Yes adding an Arc for this scenario is the solution I would recommend for 'static.

1 Like

Off Topic: Maybe we should have a thread with links to great topics? Because it would be a pity to let all of that information get lost with time :frowning:

3 Likes

The search engine works pretty well.

Yes, some great information collected here. Thanks to my thickness, it even got spelled-out in painstaking detail. :cowboy_hat_face:

But seriously, what other language of similar popularity to Rust has a forum where the maintainer of one of its most important libs is going to give support like this, just because? Not to mention the rest of the community.

(I came from Scala-land, and it's a much different world, although ZIO and the people behind it rock.)

A follow-up on this.

I wrote this utility to be used at the top-level, async layer (gRPC server entrypoints):

pub async fn run_blocking<F, R>(f: F) -> Result<R, Status>
where
    F: FnOnce() -> R + Send + 'static,
    R: Send + 'static,
{
    tokio::task::spawn_blocking(f)
        .await
        .map_err(|e| Status::new(Code::Internal, e.to_string()))
}

Then, layers down, when I need to call async from a sync fn, I use this:

    pub fn run<F, R>(f: F) -> R
    where
        F: Future<Output = R>,
    {
        task::block_in_place(move || runtime::Handle::current().block_on(f))
    }

My question: I think that using block_in_place is not necessary here, since this will always be used in a spawn_blocking context. In other words, I think this can be simplified to runtime::Handle::current().block_on(f)) only.

Agree?

1 Like

Yes, for that use-case you can get rid of block_in_place.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.