Are there any examples of high performance Rust multi-threaded apps that do NOT use async ?
I'm still on the fence about async and am wondering if there are examples of non-async high performance multi threaded code.
Are there any examples of high performance Rust multi-threaded apps that do NOT use async ?
I'm still on the fence about async and am wondering if there are examples of non-async high performance multi threaded code.
That depends on how high. I use Rouille, which is multi-threaded without async. Its README compares its performance with Tokio, Hyper, Go, Nginx, Node.js. It is fast enough for me.
Good to know. If it's only 2-3x performance hit, I'm really starting to question if the complexity / compiletime hit of async is worth it.
I agree. In the local user group, I recommend everyone asking to avoid async and recommend Rouille instead, unless their performance need is extreme. They never had extreme performance need so far, except for a single case (a company whose main product is MITM proxy).
This is safer than you may think, because those who need async tend to know it themselves and don't ask "should I use async" question. In other words, asking itself is a signal that answer is no. MITM proxy case was a rare exception.
I am getting worried recently because some library is async only and people get forced to use async just to use library and get frustrated due to Rust async's high complexity. Things are manageable so far, but I think maintaining healthy non-async Rust ecosystem is essential.
This is a surprisingly convincing argument. Those that need async performance likely know beyond a shadow of a doubt they need async.
The challenge with high performance network IO is that one needs to handle a lot of connections per core in order to archive that performance. That also means a lot of state switching that needs to be really fast.
The traditional approach is epoll/kqueue (see the mio crate). The challenge here is that you have to do the state switching manually and your logical code that handles a single connection is spread out all over the place because functions need to return in order to send or recieve data and can't ever wait for something to happen. It works, but is not pleasant to write.
async
allows you to write handlers so that the logic and the code correspond to each other. Each time the handler needs to wait for some IO, await
is used. The compiler does the work if splitting it up into many parts for you. And the way it is done in Rust is quite efficient, so there isn't even much of a performance reason not do do it.
To give at lease one example: Low-level TCP server in Rust with MIO
So my conclusion would be, that those who really needed high performance IO before async was ready, wrote it in C or C++. Those who are willing to use Rust would be almost foolish to not use async
for a large project where any possible reduction in complexity counts.
Thank you for your insightful post. Would you say the 'breaking point' for going from sync Rust to async Rust be the point where we can no longer "spawn a thread per connection" ? Up until then, why not spawn a thread per connection ?
This actually makes me question: for many internal services (not directly facing the internet via http), do I really need async ? I wonder if it is possible to have something like:
Internet -> many connections -> cluster 1 -> few connections -> cluster 2
where cluster 2 = sync Rust
cluster 1 = golang goroutines, which handles the many connections and merges them into a small # of connections for cluster 2 (like 2 connections / core)
OS threads are expensive and switching between them will have a cost that is higher than using async
and running many "logical threads" on the same OS thread. So if one really wants high performance (and one connection can't keep a OS thread busy), OS threads are not going to be the solution.
async
translates your functions into stackless generators (that is, it only stores the internal state and not a call stack).
There are also generators that have their own stack, but that comes with other costs (more memory needed).
One way to avoid async
is to use a message driven approach where there is no state to track. Everything is encoded in the messages. (GitHub - lemunozm/message-io: Fast and easy-to-use event-driven network library.)
I agree with this. I'd also add that one reason you'd turn to non-blocking APIs like epoll/kqueue instead of threads & blocking sockets is that writing network applications with blocking sockets only works well for the simplest of cases, e.g. a request-response protocol like HTTP. For example, it's impossible to read & write in parallel with blocking sockets with OpenSSL due to the fact that the TLS peer can require renegotiation at any time, requiring socket writes even during a call like SSL_read
, and thus OpenSSL has to logically own both the reading & writing halves of the underlying socket. Blocking sockets can also make it hard if not impossible to deal with cancellation and timeouts.
I've also started several projects with the intent of sticking with "simple" threading & blocking sockets, only to find synchronization is hard, that Future
s actually make synchronization easier to reason about, and start rewriting with async. I don't have a great explanation or example of this, but it's something I've run into a few times.
One option a step below a full-blown tokio
runtime, which I agree can be pretty heavyweight, is spawning threads and using async_io::block_on
with async_net
and async_channel
like I've done in this project. This allows the benefits of nonblocking sockets without the heavy weight of tokio
, and you can pick & choose where exactly you want to use async.
Would you say the "breaking point" for going from sync Rust to async Rust be the point where we can no longer "spawn a thread per connection"?
In my opinion this is a good summary.
Anything that's multi-threaded for compute reasons.
If you have 10000 requests all waiting for the DB to respond, async.await
is a godsend. But if you're trying to run image processing to stitch together a gigapixel image, async.await
is a waste of time.
I'm currently working on node_crunch, where a lot of computation may happen so async is not an option at the moment. Maybe it will in the future. I can think of making it configurable to that the user can decide whether network or CPU is the limiting factor.
The cost of creating an OS thread can be mitigated with a threadpool, and based on below, it seems the context switching overhead is similar:
A context switch takes around 0.2µs between async tasks, versus 1.7µs between kernel threads. But this advantage goes away if the context switch is due to I/O readiness: both converge to 1.7µs.
I haven't looked at how that benchmark works, but performance on modern CPUs can be more nuanced than "how fast is a context switch" because of SMP and caching. The cost of a thread-based context switch can also go up these days due to Spectre mitigations.
Theoretically, a async runtime has a lot more context it can use to leverage caching better and keep related tasks on the same CPU (though I'm not sure tokio
does this currently, though it seems it would happen naturally in a task-stealing scheduler). This video has a great explanation of this:
This talk is old, but I'm not sure the issue was ever solved. The idea is that modern networked services often have dependencies between tasks. Lets say task A depends on task B. In a threaded environment, this would mean A is blocked on futex
or equivalent until task B eventually wakes it up. With futex
, however, the kernel has no way of knowing about the task dependency relationship between A and B, so it may choose to schedule task B on a different processor than task A, especially when the processor is under load. Now, when task B wakes up task A, we incur the cost of an IPI. Multiply this by millions of tasks, and we have a performance issue.
Furthermore, an async runtime can schedule task A on the same processor immediately following the wakeup from task B, preserving the CPU cache between the tasks and potentially greatly speeding up access to whatever data is passed between them. tokio
does perform this optimization.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.