Use of async - where to draw the line

I am just trying to get my head around the issue of where to use async. Would it be correct to say that there is a certain amount of extra overhead when calling an async function?

I only just started thinking about the issue. I don't really think having too many threads is a likely to be an issue for the applications I have in mind. Or should I make all the functions that may perform IO ( typically file reads ) async? It seems more sensible to me at this stage to use tokio::task::spawn_blocking rather than doing that.

Well, what sort of stuff are you doing? You mention file IO, and putting file IO in spawn_blocking or entirely outside async code is very reasonable.

The web server I posted on recently. My thinking is to use async for the network IO ( well Axum does all that for me anyway), but do the database query evaluation in sync code. My get handler currently looks like this:

/// Handler for http GET requests.
async fn h_get(
    state: Extension<Arc<SharedState>>,
    path: Path<String>,
    params: Query<HashMap<String, String>>,
    cookies: Cookies,
) -> ServerQuery {
    // Build the ServerQuery.
    let mut sq = ServerQuery::new();
    sq.x.path = path.0;
    sq.x.params = params.0;
    sq.x.cookies = map_cookies(cookies);

    let blocking_task = tokio::task::spawn_blocking(move || 
    {
      // GET requests should be read-only.
      let stg = Box::new(state.stg.open_read());
      let db = Database::new(stg, "");
      db.run_timed("EXEC web.Main()", &mut *sq.x);
      sq
    });
    blocking_task.await.unwrap()
}

It seems fine to do it that way.

1 Like

I might resort to tokio::task::spawn_blocking in a different scenario, where I want to execute (user provided) Lua scripts (which don't perform any blocking I/O) in an async program. Yet they may run for a while and thus keep other code from being executed.

In my case, I had the idea to install a hook that runs every x-thousand VM instructions, which could then yield from Lua regularly.

Apart from having to invest extra effort in making my Lua execution yielding regularly, I'm not sure if it's really worth the effort in my use case, as it would also impose overhead, and the overhead of using spawn_blocking might be way smaller.

But there is one other thing I wonder about: There is a maximum limit on blocking theads. Thus, depending what the blocking theads do, I wonder if it's possible that this might cause deadlocks (e.g. if 512 threads wait on a result of a 513rd thread which never will get executed until one of the 512 threads finishes its work). But maybe that's not an issue in 99% of all application cases.

I wonder what the rationale for the default limit of 512 is? I believe 64 bit windows can have in excess of 50,000 threads, although whether that is reasonable or healthy I doubt.

We do need to have some limit, because if we don't then any program that spawns at a higher rate than the tasks can finish will run out of resources rather quickly and crash. It's a form of backpressure.

Why exactly we went with 512, well, it's a decently large number and seems reasonable :woman_shrugging:

3 Likes

If you don't plan to support tens of thousands simultaneous connections, then purely synchronous code should be simpler to write and reason about, especially considering that your DB API is synchronous as well.

Presumably async code can actually block for a short period of time due to virtual memory page misses. Processing a database query is similar really - it may block due to the data not being in memory. I don't think it makes sense to process more than a moderate number of queries at the same time, it would probably slow the throughput rather than increase it - there are inherent limits to how much a computer can do in parallel.

So I am now convinced making the query processing code synchronous is the correct approach. Equally, I think it makes good sense to make the network IO async. What happened is I had sudden panic : do I have do re-write all my code as async? I think the answer is a clear no - it would actually be counter-productive.

That wouldn't happen in my case, the threads are independent (albeit they are reading from a common pool of data). What could happen in principle is that large number of long-running read-only transactions could prevent a short-running read-only transaction from starting. So some kind of "fairness" issue. However I don't think it would be a problem in reality. I doubt there would reasonably be more than a handful of long-running read-only transactions running at the same time, let alone 512.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.