All Concurrency Models Comparision [help]

I am fairly new to rust, I know that async-await has recently been stabilized and to maintain the zero-cost they are not using the standard event loop being used by NodeJs or others. If someone can clarify all the concurrent models and expand on when they might be useful, it will be so helpful.

Q: Java has thread-pools, I guess rust also has. Are they any differences?
Q: Go has CSP and coroutines, does rust have green threads or similar?
Q. What's the difference between async-await and something like rayon? In what situations are they prefered to one another.
Q: Actor model, Actix uses, is it better than async-await, is this only in some cases, or do they solve different problems.

Basically, I have general doubts over concurrency models, if there are readings on pros-cons, usages, of different concurrent models, I hope someone can share them. This seems like a general problem independent of language at a higher level and most software developers should know the basics.

1 Like

Rust does not have green threads and async await is a replacement.

A quote i hear often is that async is about waiting for a lot of things at once, while rayon is about doing a lot of things at once. The main idea is that in async, you expect to spend most of your time waiting for IO, and the executor is optimized for that using various methods to let things that are waiting sleep, while only running the tasks with stuff to do.

On the other hand rayon will just call the tasks and perform the computations directly. There's not mechanism for sleeping tasks, and it's all about being CPU bound.

The thread pool in Java is similar to stuff like the thread pool crate. I'm not familiar with the actor model in actix.


For the purpose of these discussions, you're really want to disambiguate between the ability to wait on multiple I/O operations at the same time (asynchronism) and the ability to execute multiple strings of CPU-bound code at the same time (parallelism).

Rayon is a library designed first and foremost for easy and efficient parallel code execution, whereas async-await is a language feature that is designed first and foremost for easy and efficient asynchronous I/O.

I'm less familiar with Actix, but as far as I know the idea is rather to enforce a certain application structure based on communicating actors, kind of like Go does, as this design is apparently very effective when building things like web servers. IIRC, you can use actix with both async-await and regular blocking I/O.

So, why do we need to distinguish between concurrent I/O and CPU work? After all, OS threads can be used for both I/O and CPU work, because the OS fans them out across CPU cores and automatically replaces them with other threads when they block on I/O. However, they exhibit some unintuitive behavior and are somewhat memory- and CPU-inefficient when used in large numbers, which is why performance-conscious people sometimes look at more specialized alternatives that can only do a subset of what OS threads do.

Thread pools allow you to have parallelism without the overhead of one OS thread per concurrent task, by having one thread per CPU core and feeding each of them with a queue of jobs. They should only be used for pure "batch" CPU-bound work which does not block on I/O or otherwise synchronizes with other threads, as otherwise the corresponding CPU core will go idle and in the worst cases you can get a full thread pool deadlock. As far as I know, they work similarly in all languages.

Event loops and reactors allow you to have asynchronism without multi-threading by having a single thread block waiting for many different I/O events to happen. They are rather hard to write by hand, so people tend to combine them with a higher-level abstraction, namely...

...coroutines, which are basically a task that can block and resume. There are two variants of them:

  • Stackful coroutines, like goroutines, are essentially like OS threads but managed by the application. They each get their own stack, and can be easily swapped in and out of the reactor thread when blocking on I/O. However, it's essentially impossible to make them more efficient than OS threads without a heavy language runtime that hampers some low-level operations (embedded work, FFI...), which is why people more recently came up with...
  • Stackless coroutines, like Rust's and C#'s async/await, which are a more clever design based on compiler-generated state machines. Can be efficient without a heavy runtime, but harder to implement and means that coroutines live in a little language world of their own and interactions between coroutines and "normal" code are harder. Depending on the language, may require various code annotations (async, await, etc.).

Depending on the event loop and reactor design, you may either be executing one coroutine at a time or multiple coroutines at the same time. There are pros and cons for each approach, and Rust's async/await doesn't really enforce a specific choice, but most available executors go for the "one thread waiting for events, fanning out CPU work to a thread pool" design.

All kinds of coroutines require you to use specialized I/O primitives that pings back the underlying runtime about the fact that the coroutine is ready to suspend instead of immediately blocking the OS threads. That is a general ergonomic weak point of this design.

The reason why we have largely different ecosystems for asynchronism and parallelism is that they have somewhat different requirements.

Asynchronous I/O is mostly used in areas like web servers where you're processing millions of concurrent requests and your figures of merit look like average request latency or number of requests processed per second. In this sense, it's most important to have small tasks (otherwise you'll blow your RAM), super fast task setup and teardown, and efficient primitives for awaiting many I/O events at the same time.

After all, it you only have a small amount of large I/O tasks, you could just create an OS thread dedicated to processing them and synchronize with it when you need the I/O results. And that's actually a common way to emulate asynchronous I/O on operating systems without good built-in support, e.g. file I/O on Linux.

While parallelism is sometimes used in this context, it is also used in other contexts where there are only a small amount of very large tasks to be processed as quickly as possible like supercomputing and CLI utilities. In this case, what you really care most about are the ergonomics of splitting a complex task into smaller-sized chunks that can keep all of your CPU cores busy, and this is what Rayon shines at.

Due to latency concerns, the server world always wants to keep the event loop running as fast as possible so that small tasks wait as little as possible. This is what led thread pool-based parallelism to be coupled with event loops, as mentioned above. The event loop waits for I/O events, creates CPU tasks associated with processing those events, offloads them to the thread pool, and goes back to processing input events as quickly as possible. The thread pools used in this context operate under stronger latency constraints than something throughput-oriented like Rayon, so they tend to have slightly different designs that put a higher priority on things like fairness or priorization over raw task processing speed.

Hope this helps making sense of that big convoluted concurrency mess.


Thank you, your answer has definitely resolved most of my confusion. I still need to dig deeper. Can you expand a little on "one thread waiting for events, fanning out CPU work to a thread pool"? Also, if I have only a single core, is rayon basically useless at that point?

Can you expand a little on "one thread waiting for events, fanning out CPU work to a thread pool"?

I mostly write compute-bound code, so this is not the part that I'm most familiar with, but what I know is that all modern operating systems provide you with a way to wait for multiple network I/O operations with a single system call. There are two flavors:

  • In readiness-based asynchronous I/O, like Linux' epoll, the OS pings you when there's data waiting to be read or buffer space available for a write on any of the multiple sockets which you have registered before blocking.
  • In completion-based asynchronous I/O, like Windows' IOCP, the OS sends or receives a full message (in the sense of a buffer of bytes whose size is directed by application, not by OS or hardware constraints) for you and wakes up your I/O thread when that's done. A single I/O thread can register for as many I/O notification as it likes.

Using epoll readiness-based I/O as an example, what the event loop + thread pool design will do with this OS primitive is that it will register all sockets associated with ongoing network connections on a single event loop thread, and have that thread call epoll in a loop.

Once a connection is ready for reading or writing, the event loop thread will figure out which corountine is in charge of processing that and offload the work of running that coroutine to a thread pool, then immediately resume waiting for I/O events while the coroutine executes in the background

Also, if I have only a single core, is rayon basically useless at that point?

It will not give you CPU perf benefits, and may actually bring your execution speed down a bit. But it can still be useful for future scalability, as single-core machines are not the majority anymore and on a decaying trend these days...


Thank you, makes sense.