For the purpose of these discussions, you're really want to disambiguate between the ability to wait on multiple I/O operations at the same time (asynchronism) and the ability to execute multiple strings of CPU-bound code at the same time (parallelism).
Rayon is a library designed first and foremost for easy and efficient parallel code execution, whereas async-await is a language feature that is designed first and foremost for easy and efficient asynchronous I/O.
I'm less familiar with Actix, but as far as I know the idea is rather to enforce a certain application structure based on communicating actors, kind of like Go does, as this design is apparently very effective when building things like web servers. IIRC, you can use actix with both async-await and regular blocking I/O.
So, why do we need to distinguish between concurrent I/O and CPU work? After all, OS threads can be used for both I/O and CPU work, because the OS fans them out across CPU cores and automatically replaces them with other threads when they block on I/O. However, they exhibit some unintuitive behavior and are somewhat memory- and CPU-inefficient when used in large numbers, which is why performance-conscious people sometimes look at more specialized alternatives that can only do a subset of what OS threads do.
Thread pools allow you to have parallelism without the overhead of one OS thread per concurrent task, by having one thread per CPU core and feeding each of them with a queue of jobs. They should only be used for pure "batch" CPU-bound work which does not block on I/O or otherwise synchronizes with other threads, as otherwise the corresponding CPU core will go idle and in the worst cases you can get a full thread pool deadlock. As far as I know, they work similarly in all languages.
Event loops and reactors allow you to have asynchronism without multi-threading by having a single thread block waiting for many different I/O events to happen. They are rather hard to write by hand, so people tend to combine them with a higher-level abstraction, namely...
...coroutines, which are basically a task that can block and resume. There are two variants of them:
-
Stackful coroutines, like goroutines, are essentially like OS threads but managed by the application. They each get their own stack, and can be easily swapped in and out of the reactor thread when blocking on I/O. However, it's essentially impossible to make them more efficient than OS threads without a heavy language runtime that hampers some low-level operations (embedded work, FFI...), which is why people more recently came up with...
-
Stackless coroutines, like Rust's and C#'s async/await, which are a more clever design based on compiler-generated state machines. Can be efficient without a heavy runtime, but harder to implement and means that coroutines live in a little language world of their own and interactions between coroutines and "normal" code are harder. Depending on the language, may require various code annotations (async, await, etc.).
Depending on the event loop and reactor design, you may either be executing one coroutine at a time or multiple coroutines at the same time. There are pros and cons for each approach, and Rust's async/await doesn't really enforce a specific choice, but most available executors go for the "one thread waiting for events, fanning out CPU work to a thread pool" design.
All kinds of coroutines require you to use specialized I/O primitives that pings back the underlying runtime about the fact that the coroutine is ready to suspend instead of immediately blocking the OS threads. That is a general ergonomic weak point of this design.
The reason why we have largely different ecosystems for asynchronism and parallelism is that they have somewhat different requirements.
Asynchronous I/O is mostly used in areas like web servers where you're processing millions of concurrent requests and your figures of merit look like average request latency or number of requests processed per second. In this sense, it's most important to have small tasks (otherwise you'll blow your RAM), super fast task setup and teardown, and efficient primitives for awaiting many I/O events at the same time.
After all, it you only have a small amount of large I/O tasks, you could just create an OS thread dedicated to processing them and synchronize with it when you need the I/O results. And that's actually a common way to emulate asynchronous I/O on operating systems without good built-in support, e.g. file I/O on Linux.
While parallelism is sometimes used in this context, it is also used in other contexts where there are only a small amount of very large tasks to be processed as quickly as possible like supercomputing and CLI utilities. In this case, what you really care most about are the ergonomics of splitting a complex task into smaller-sized chunks that can keep all of your CPU cores busy, and this is what Rayon shines at.
Due to latency concerns, the server world always wants to keep the event loop running as fast as possible so that small tasks wait as little as possible. This is what led thread pool-based parallelism to be coupled with event loops, as mentioned above. The event loop waits for I/O events, creates CPU tasks associated with processing those events, offloads them to the thread pool, and goes back to processing input events as quickly as possible. The thread pools used in this context operate under stronger latency constraints than something throughput-oriented like Rayon, so they tend to have slightly different designs that put a higher priority on things like fairness or priorization over raw task processing speed.
Hope this helps making sense of that big convoluted concurrency mess.