How to implement a future for a long running function I can not modify?

Hello there,
I am currently trying to build an asynchronous server with a database connection (Google Firestore). I am using the actix-web framework to do the HTTP routing etc., which uses a tokio executor under the hood.

As far as my understanding goes, each incoming HTTP request is delegated to a request handler function, which then returns a future, which is then polled by the underlying tokio executor, finally resolving it to a HTTP response.

My current task is to make the database queries asynchronous. There is a function in the firestore1 API, which executes a query and returns the query's results in a Result. However, this function is currently blocking, and I would appreciate some help with figuring out how to implement a future which will check whether the function has finished yet or not.

I thought of calling the blocking function on a seperate thread and letting it put the result in a Box<Option<T>>, so that the future could check the box if there was Some(T) in it, otherwise return Poll::Pending. However this would be inefficient, since the handler itself is already a tokio task issued by actix, and issuing another task inside that task seems redundant.

What would be the appropriate way to create a future for a long running function, which I can not modify?

I recommend looking into this, which allows you to run blocking code in Tokio. Of course it's not as efficient as if the library was rewritten into futures, but it's a start. If you wish to start a thread and run it there, you should use an one-shot channel instead of putting stuff in an Option (note that the Receiver is a Future).

futures-cpupool will run any function for you, and give you a Future for it.

2 Likes

Thanks a lot for the directions! It took me some time to figure things out, and in the end I actually found actix_web::web::block, which seems to do the same thing as tokio_threadpool::blocking, only that it is part of the actix_web framework.

But this would manage its own threadpool, wouldn't it? actix_web already runs on tokio, could futures-cpupool tap into that?

It's important that this threadpool is separate, so that Tokio performance is not negatively impacted, and it's not starved of threads for making progress on other tasks (which may be indirectly required for your long-running task).

Is it not a flaw to have more than one threadpool? What is the point of having two thread pools as opposed to have one thread pool with twice the workers? Two threadpools would only mean you'd have to run two managers, wouldn't it?

It's a minor overhead, not a flaw. Shared threadpools lead to deadlocks if there are dependencies between threads.

For example, if you have 10 threads, and launch 10 tasks blocking on a request, tokio won't have any thread free to perform this request. It'll wait for you to put the threads back in the pool, and you'll get a deadlock.

Yes, I think I see your point. However I thought that the deadlock issue was already taken care of with the future mechanics. When defining a task as a sequence of dependent operations, they will all be executed by the same thread pool and required operations will be executed before the ones that depend on them. So an operation could not be reached by the executor unless all of it's required operations have been resolved already. But.. hm, I guess there is the possibility that all workers try to execute the consumers of some shared state, and the writing task does not get executed.. But to prevent this, you would actually need seperate executors/threadpools for the producers and consumers.

The reason for separate threadpool is not the deadlock, but the blocking itself. For example, if your web server has 16 threads for shared threadpool, and if you got 16 requests that requires 10-seconds-long blocking(IO or computation) operation, now this web server looks completely down for 10 seconds. But if we separate threadpool for blocking task, now it would be slightly slower due to OS level context switching but at least it makes responds in time.

Hmm, what I understand from what you are saying is that the reason for multiple threadpools is basically to be able to use the OS level context switching, which will interrupt the blocking tasks on the OS level and allow for balanced execution of any number of tasks. And a single threadpool which would run on an OS thread itself, would not benefit from this, because all of its tasks are all interrupted and continued together (is that correct?).
Couldn't tokio perform the regular context switching by itself? Why do we have to depend on the underlying OS to do this? It seems to make the system more complicated and less self contained.

Tokio uses cooperative multitasking based on the Futures model. But your goal was to avoid the cooperation and be able to perform blocking operations that can only be pre-empted by the OS. Therefore, you have to use OS-prempted threads, instead of Futures-preempted threads.

Even though Tokio has its own threadpool that could theoretically do blocking operations, it's explicitly not designed for that, and doing so would disrupt Tokio's operation.

If you want to block threads, get your own threads.

5 Likes

Ah okay, I think I am starting to see the big picture a bit more clearly. Thanks everybody for your help!

1 Like