Tokio: Is it possible to do concurrent write on a File when using spawn_blocking

I've been using Tokio and async for a few months, but I only recently started getting interested in I/O interactions. As such, I am still getting confused with bridging async and non-async code.

I saw that tokio::fs was using std::fs inside of a tokio::task::spawn_blocking call. Seeing that:

Tokio’s file uses spawn_blocking behind the scenes, and this has serious performance consequences. To get good performance with file IO on Tokio, it is recommended to batch your operations into as few spawn_blocking calls as possible.

So I'm wondering a few questions about the inner workings of spawn_blocking, for which I couldn't find the answer:

  • can multiple threads spawned with spawn_blocking execute concurrently ? or are they sequential ?
  • if it's possible to have concurrent blocking operations, can a file be opened by multiple async tasks that use a spawn_blocking for the I/O related operations ?
  • again, if it's possible, what are ways to prevent concurrent file write ?

Another question had more to do with my understanding of what awaiting on a spawn_blocking handle does to the task inside the blocking thread.

If I await on a handle, I yield back to the scheduler which then chooses another future to poll. Does it mean the task inside the blocking thread stops ? From my understanding it shouldn't. I read on a post from Alice Ryhl (Async: what is blocking) that there are about 500 threads in the thread pool.
Given that modern CPU don't have 500 cores (maybe one day :laughing: ), if I spawn more blocking threads than there are cores, some blocking tasks will have to stop. Or are they handled until completion ?

My actual code application is a file that gets written on often, but which I will need to read sometimes. I already handled no concurrent write by using an mpsc, so the request to write on the file are handled by a single task which will write on the file. But due to the way the application is built, I can't put the read task in the same place, meaning I can't guarantee I won't try to open the file to read it while another task is trying to write on it.

I considered using an Arc<AtomicBool> to signify when the file is ok to be read or not (or other types of solution with Mutex), but I was wondering if there is a more idiomatic way to do it with Tokio, since it seems to me that the problem of concurrent file access should come pretty quickly once one uses an async runtime and bridges with the filesystem.

Thank you for taking the time to read !

From spawn_blocking docs:

Tokio will spawn more blocking threads when they are requested through this function until the upper limit configured on the Builder is reached.

No. Awaiting the handle just means that the task will be polled as part of the local Future.

Tokio limits the number of blocking threads to 512 by default. Spawning more blocking threads doesn't mean that previous threads will be stopped, but there's something to bare in mind (taken from Builder in tokio::runtime - Rust):

Unlike the worker_threads, they are not always active and will exit if left idle for too long. You can change this timeout duration with thread_keep_alive.

Reading a file while writing to it from elsewhere will only result in reading partial data. If this is a problem in your domain logic, then awaiting the writing task is what you should do. You don't need anything else.

1 Like

As i understand it, the short version is spawn_blocking can overlap with others by spawning more threads (i dont think there's a limit you're likely to hit), and the OS will serialize writes to the same file. You're likely to run into ordering issues though!

This is mostly what I thought, thank you for confirming !

I thought of another solution for my specific problem, with some tokio::select and channel to ensure I'm not writing and reading at the same time (since my initial problem was that the part of the code where I read is not the same as where i write, so a simple await doesn't work out of the box, but using channels will)

1 Like

I could be misunderstanding, but I think what you said applies to tokio runtime threads, but not to blocking threads, and I understood they were asking about blocking threads. When doing blocking file IO with SSDs, you will need a very large number of threads to saturate the IO device, many more than the number of cores.

1 Like

Good observation! Yes. Tokio defaults to 512 max blocking threads. Will correct that in my reply above.

1 Like

The thing about limiting thread pools to the number of cpu cores is important for cpu-bound work. For blocking IO the situations is different because the threads are just sleeping most of the time.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.