Async Facilities

Hi all,

I have been learning about async programming facilities in Rust for a while and given that I have some decent experience with multithreaded programming (in C++) I fail to see the core advantage. Yes I have seen arguments about everywhere regarding "threads overhead", "threads memory space" yet I don't find them very convincing. My main problem is that you will always need some actual threads to do the blocking/notification part and then somehow communicate that with some shared state (likely a thread safe queue) to the rest of your code. Now this queue already allows the rest of the code to not block and potentially proceed to do something useful and also allows the scaling of consumers up or down. You may have as many consumers as you want up to potentially just one handling any amount of tasks without any additional threads overhead. Let me elaborate with the classic concrete example of "handling many sockets". After reading about it, I find that OS APIs like "epoll" are what allow user code to handle multiple connections without the need to have 1 thread per connection. So in that example one can imagine one thread on the producer side which is monitoring multiple connections via that API and putting any available data into some shared queue with some envelope identifying the source connection and on the consumer side, you can have any number of threads handling the incoming data from that shared queue. Compared to a clear pcp pattern with a shared queue, I see a lot of moving pieces in the Async pattern and as I said I fail to see the main advantage. Perhaps someone can help me out with that?

Thanks

Resources I examined

https://rust-lang.github.io/async-book/01_getting_started/01_chapter.html

This isn’t necessarily the case. OS mechanisms like select, epoll, and kqueue let single-threaded programs handle multiple concurrent connections efficiently, and Rust’s async system is designed to allow an executor to take advantage of them.

2 Likes

Sorry, It's still not clear to me, an epoll_wait for instance will either give you something immediately to handle if any or block until it can give you any events which are ready to handle so in a single threaded environment this would only mean that everything can block in the case that all what epoll monitors is not ready so what does the async help make better in that case?

Also to clarify, I initially compared to multithreading because I saw that comparison made almost everywhere that explains async programming (for instance the async book link I mentioned in the main post)

It helps manage the state of the various connections in a single-threaded context, by providing each a dedicated pseudo-thread. These are more efficient than OS threads because the context that needs to be switched out is tailored to the actual work that needs to be done.

This basic setup can only take advantage of a single compute core, though, so the more advanced setups use a small thread pool to take advantage of the entire CPU.

You certainly can achieve similar performance by hand-coding your own epoll-based dispatcher, but that generally involves writing a state machine to describe the behavior of each connection, which is more cumbersome than the equivalent procedural formulation.

It helps manage the state of the various connections in a single-threaded context, by providing each a dedicated pseudo-thread.

You certainly can achieve similar performance by hand-coding your own epoll-based dispatcher, but that generally involves writing a state machine to describe the behavior of each connection, which is more cumbersome than the equivalent procedural formulation.

My personal original thoughts were also more in the direction of it being a better way to organise the code.

These are more efficient than OS threads because the context that needs to be switched out is tailored to the actual work that needs to be done.

That's again the part that confuses me because in reality, you shouldn't be creating an unbounded number of threads anyway but rather a thread pool with some task queue so I don't think it's a fair comparison this way :slight_smile:

If you want fewer than one thread per connection, you’ll need to do the procedural->state machine translation at some point so that you can pause and resume the processing of a single connection. Async automates that process so you don’t have to do it manually.

1 Like

Yes, I agree with that. I can say I got confused from the performance comparisons to threading.

The async programming was introduced as an answer to one of the millennium problems called C10K problem. Is it possible to serve 10K concurrent connections with single typical machine? To do so with threadpool you need to run 10K threads at least, but in 1990s typical machines don't even have enough RAM for it.

I think it is possible without async programming (if you mean Rust async). In a single threaded environment you would need to cycle through the ready connections from the 10K connections and in a multithreaded environment you will definitely not need 1:1 connections to threads but rather each n connections being cycled through by one thread. I would say that handling 10k connections concurrently wouldn't be possible if the OS didn't provide APIs like epoll because in that case you need to block on each connection separately in dedicated thread. So to me async programming provides a structured way of this cycling process or at least that's the benefit that I was able to grasp so far.

async tasks are intended to consume fewer resources, to be more "light weight" and therefore more efficient. It is true they can be a bit tricky to use, there is definitely a "learning curve". But on the other hand they work well once you manage to figure it out ( in my limited opinion ). My code which uses async is here, as an example:
https://github.com/georgebarwood/RustDB/blob/main/examples/axumtest.rs

This is what we call asynchronous programming. The async-await syntax allows you to construct async tasks via writing logic in synchronous-like code with linear control flow. But those async tasks are fundamentally executed in the cycle, with some more optimizations.

2 Likes

Yea thanks, I find that more convincing than those unrealistic comparisons with multithreading :slight_smile:

Thanks alot for showing an example!

async tasks are intended to consume fewer resources, to be more "light weight" and therefore more efficient.

I do totally understand that spawning actual threads comes with an overhead. But my point is if you do not need threading at all you can also just cycle through I/O handles and consume any ready input given that the OS provides support for that like epoll API. In that case whatever you achieve via Async you can also achieve with an ordinary sequential piece of code with the same performance. As I said before, since I'm new to this pattern I was mostly confused by, what I find unrealistic, performance/efficiency comparisons to multithreading. With that said, I can see that it is a better way to achieve this concurrency effect in a single thread without having to manually do these transitions between tasks.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.