Single-threaded tokio app

Background: I am building a single-threaded application that is meant to be run by a single CPU (exclusively):

#[tokio::main(flavor = "current_thread")]
async fn main() { ... }

Questions:

  1. Is it possible to build rust program in such way that (underneath) it uses single-threaded CRT? (single-threaded CRT is simpler and faster than multi-threaded one)
  2. Does tokio take advantage of "this app is single-threaded" guarantee? (similarly to CRTs, single-threaded task executor can be designed to be simpler and faster)
  3. Why (according to TaskManager) following program has 1 thread on Windows 7 and 4 on Windows 11? (pretty sure it is not because 11 - 7 = 4):
#[tokio::main(flavor = "current_thread")]
async fn main() {
    sleep(Duration::from_millis(3000)).await
}

Tokio's single-threaded runtime will still spawn threads if something uses spawn_blocking:

Note that if you are using the single threaded runtime, this function will still spawn additional threads for blocking operations. The current-thread scheduler’s single thread is only used for asynchronous code.

Various things in tokio might do this:

These threads should be idle almost all of the time. If this is unacceptable, then you should probably use a different executor.

Tokio should take advantage of the runtime being single-threaded.

Not sure why Win7 and Win11 are different, and no idea how you'd switch out the C runtime or which one rust uses by default.

Not sure if you need this, but if you want the program to stay pinned to one logical core, then you can use core_affinity (this'll only work for the tokio's worker thread, not the spawn_blocking ones) or do something on the OS level when you start the process.

Yes, I know this.

Are there any good options available?

What do you need it to do? What's stopping you from staying in sync code?

I am aiming for thread-per-core architecture where every thread is a separate single-threaded application with the aim off spawning X of them, assigning each a separate CPU core (which is excluded from use by OS).

So, I am naturally curious if I could take advantage of this setup -- namely if I could use single-threaded CRT and whether my tool of choice (tokio) can do the same.

I know it is a glaring example of premature optimization, but... Consider it a fit of intellectual curiosity. :slight_smile:

(btw, I was unpleasantly surprised to find that MSVC no longer has single-threaded CRT).

So much of the Windows API and commercial libraries are multi-threaded internally that the single-threaded CRT probably created a hazard.

This sounds like it should just be sync... well actually, you don't say anything about the actual work so I have no idea. But usually you'd only want one thread per core if those threads are constantly running, in which case async isn't going to do much for you.

That is not true.

You can use the smol ecosystem for that. It’s pretty good.

By the way, there is a blog post dedicated to this specific topic here: Local Async Executors and Why They Should be the Default

Also a blog post answer to it from withoutboats, and, probably less relevant, a hackers news thread to withoutboats’ blog post. Just mentioning for the sake of completeness, but I’m personally not advocating for anything here.

Thank you, I'll have a look at it. But cursory glance suggest it is designed for multi-threading environment (doco mentions "you are still free to push blocking tasks to a thread pool"), which means there are places where it has to pay for it.

All these points were well-discussed about 20 years ago (or more), well before invent of Rust. All in context of event loops (see libev, etc). It is amusing to observe same points being made, only now it is in Rust context.

Bottomline, MT-executors give you convenience (provide built-in work-stealing) at a price. But there is a small class of problems where you just don't need it (for whatever reason). And there is a (even smaller smaller) class of problems where you won't need a thread pool (or any additional threads) -- this class can take advantage of "single-thread" guarantee (and use single-thread CRT).

All I wanted to learn if Rust ecosystem has support for this out-of-the-box. Looks like it doesn't (or I couldn't find it):

  • there is no (mature and stable) async library that is specifically designed for "single thread" case
  • there is no way to force compiler to use single-thread CRT

I’m not sure how the excerpt you quote is providing evidence that it’s designed for multi-threading environment specifically. It would be helpful to add a link to the page of the doc you are quoting for more context. For all I know, LocalExecutor is a single-threaded async executor which is designed to be single-threaded from the ground up. From the quoted text, I understand that you have to decide how to manage blocking tasks yourself. The blocking crate is providing a way to spawn blocking tasks on a separate thread pool so you don’t block the runtime, but you are free to not use it and manage blocking tasks the way you see fit. If you are solving a problem which does not require a thread pool, then surely you don’t have any blocking task to begin with, or maybe just for specific sections / passes.

I’m sorry, my intention was not to open a debate on this. I just thought the links were relevant to the discussion, specifically for Rust.

I’m pretty sure smol is providing what you need for the first bullet. (Tokio arguably applies too, because the single-threaded flavor is actually using a completely different scheduler implementation under the hood.) But then, if you rely on a blocking task which must be offloaded, my understanding is that what you are trying to solve is not part of the class of problems which can fully take advantage of this "single-threaded" property to begin with. I’m sure you know about this, but of course the notable exception is file I/O that both smol and tokio are not handling in a truly asynchronous way, glommio being what you are looking for, but relies on io_uring which is a Linux thing. I’m not aware of any good Rust library for Windows ioringapi.h. You would have to use windows-sys directly, which is obviously not a great alternative to a high level async runtime.

I didn’t answer the second point because I simply don’t know. That being said, what you say sounds accurate. I’m not very optimistic, as it’s something that is generally becoming less widely available as far as I know. As you mentioned, Microsoft already phased out their single-threaded CRT and they no longer mention it in their documentation, favoring the multi-threaded ones instead.
I believe that if something like this was available in Rust today, it would be mentioned at this place. However, I am not an expert on Rust linkage and CRT. Actually, I enabled tracking on this thread in the hope that someone more knowledgeable than myself would appear and clarify this specific question.

3 Likes

LocalExecutor is built on top of Executor, which uses Arc, RwLock, Mutex and etc. Unless these types take into account whether or not your app is single-threaded (and compile into something different), I doubt LocalExecutor avoids "paying for multi-threading" (regardless how small the price is).

No, there will be no blocking tasks. Logging is moved into a separate process, communication with it is async. So, no printlns in the code... and no file i/o (no need for io_uring).

Not trying to debate it, just chuckling about history repeating itself. :slight_smile:

No worries, I really appreciate your help -- I am pretty new at Rust and you gave me quite a bit of new and useful info. Thank you.

They finally saw the light.

That's precisely why you wouldn't get what you desire. “Single-threading-with-support-for-multithreading, too” (like some OSes includeing GNU/Linux tried to do) just don't work.

To benefit from single-threadedness you need to build averything around it.

In an era of Windows 1.0 or MacOS Classic… this was easy because everyone and their dad was doing that.

Today, in an era of multicore CPUs… you would have to rely on developers with similar needs. And if all three of them would pool their efforts… maybe something interesting would be created.

But don't expect Rust to provide anything “out of the box”. It's too niche and too viral for anyone to develop anything for large computers. Maybe you can take something like Embassy and adopt it for desktop. But this wouldn't be easy.

Yeah. Precisely that. Something which no one (except for the aforementioned three guys) would want to support on desktop.

These things are pretty common in the embedded space (simple because of cost constraints) thus maybe it would be easier for you to create something you need/want as an extension of “Embedded Rust”, not as shrinkage of “Desktop Rust”[1]


  1. It should, probably be called Desktop/Mobile/Server Rust, but you get the idea: big CPUs, threads are everywhere and thus there are very few developers who want/need code which doesn't support them. ↩︎

1 Like

Right, reading again, I think I am the one who misunderstood the intention :slight_smile:

I missed this… You are right. Tokio’s current thread scheduler also uses such primitives internally. I’m guessing it’s to keep the Runtime type both Send and Sync. In smol case I didn’t expect this because LocalExecutor is !Send and !Sync. They did it using a PhantomData marker, probably to uphold some safety invariants.

Given this, I think the only other runtime that actually could not use any synchronization primitive may be Embassy, but it’s for embedded applications.

EDIT: oops, looks like I had a race condition with @khimru haha