Avoid Async Rust at all costs" - comments from experts?

from my Rust async 70%-novice perspective:
I would like more Rust native ergonomic support to ease the async programming burden. Right now we have nice Rust async syntax for fn and for blocks and now for traits in 1.75.
But most of the code still has to be very specific design and syntax to the async executor crate so it has to be "littered" with tokio specific or async-std specific or another async exec crate specific API. Plus we must add specific executor macros, rules, features, etc.
This makes every async program using a different executor like all-new Rust async endeavour, despite a lot of similarities.
You not only must learn intricacies of async Rust in general but you must really know specific async exec create internals and its features.

Compare this wtih Rust threads support in 1.75, i.e. Rust parallel programming, this is vastly easier to reason and code and learn than using async - from ergonomics and other angles.

thank you for reading thus far, please correct me if I am wrong.

3 Likes

If "large" is in the tens of thousands range (concurrently), then yes. If less than this unscientific threshold, what's wrong with blocking I/O in a thread? This is the point I have been attempting to make. Just because you have a use case for I/O doesn't automatically make it a good candidate for async.

2 Likes

People often talk about the ability of async to "scale up" to e.g. thousands of concurrent connections, but I also think it's valuable when you want to "scale down" e.g. when I have a much less powerful computer and I want a process to run in the background without a performance impact on the rest of the system. Or when I only need to service a dozen of my friends but I want to run the server on a Raspberry Pi Zero because that's the computer I have on-hand that can be consistently online.

7 Likes

Yes ! Beeing able to do more with less is always a win. Sometimes I think the internet is full of super rich developers that go around telling you to "just upgrade your aws instance" like that doesn't cost money.

5 Likes

Raspberry Pi Zero can serve "a dozen" of connections using threads just fine. Should I remind you that it has a 1 GHz CPU and 512 MiB of RAM?

Async is not a magic silver bullet, it allows you to save a bit on context switches and on RAM for thread stacks. Well written sync program can easily run on 8-16 KiB of stack space per thread. Multiply it by the number of connections to get a grossly overestimated upper bound of RAM overhead of threads compared to async. As for context switches, if most of your threads simply wait for IO, then overhead of context switches will be minimal, since OS usually will not wake threads until blocking IO is ready. Oh, and I hope your epoll-powered async program (tokio's default) does not do any disc IO, because you will be unpleasantly surprised.

So unless you want to get the absolute last drop of throughput from your board, I don't think async is worth the trouble in this case.

6 Likes

Few things stumbled my eye

  1. He claims Go and Erlang implement preemptive multitasking. Not sure about Erlang, but definitely false about Go. Latter one uses stackful cooperative multitasking while Rust uses stackless one. This means that Go manages virtual stack for each goroutine while Rust makes async function do its housekeeping, which allows more freedom where to store its state during suspension. It's a well-known tradeoff stackful vs stackless.
  2. OS/HW threads may be closer to coroutines today but they will never be, both in terms of CPU usage (syscalls and other stuff) and memory usage (allocating full thread stack, all support structures etc. etc.). Author is too obsessed with generously spending resources which are not his and out of concrete context.
3 Likes

I am trying to use Rocket crate to make web application and Rocket 0.5.0 is going async fully.

I am interested in async because of this, but have little to none knowledge about it.

Right now I am reading Rust async book suggested by @jumpnbrownweasel

As much as I understood from it, async and multithreading are to different things solving different problems.

Multithreading is used to do multiple jobs at same time.

Async try to use processor resources more efficiently.

If I where to try to describe things in different way and consider thread as shovel it would be:

  • Single threading is where here are only one shovel and one worker. He uses shovel to dig a hole. If he needs to go to toilet before he finished his job, he takes shovel with him to toilet and work stops and need to wait for his return.
  • Multithreading is where you have multiple shovels. Let us say 5 shovels and 5 workers. They dig 5 holes. If one needs to go to toilet before he finished his job, he takes shovel with him to toilet and only four workers are left working. If all four needs to go to toilet at same time, job is stopped.
  • Async single threading would be one shovel, but multiple workers. One worker digs a hole, and others just look at him and smoke. If working worker needs to go to toilet, he gives his shovel to one of waiting workers and go to toilet. Worker who got shovel starts to dig his own hole. After worker gets back from toilet he stands in queue and smokes why looking at working worker until this worker will need to go to toilet and other worker in queue will get shovel to dig his hole.
  • Async multithreading is where we have multiple shovels, for example 5, but we have more than 5 workers. Any number of workers. While 5 works at same time, others just look at them. If one of 5 needs to go to toilet he gives his shovel to another worker in queue and new worker starts to dig his own hole. It is same as async single threading but with multiple threads (shovels).

So, in async only one code works at same time on one thread, but if this code needs to wait for something it gives right to use thread to other code and do not waist computer resources. It does not do same thing as multithreading so you cannot replace it with multithreading. And async is mostly useful only for code which needs to wait for something. For example, web, there application needs to wait for response from user or something. Like WebSocket most of time just waits for incoming data to send it to other users. But here is no meaning in using async where code do not need to wait.

Or am I wrong?

6 Likes

In your example the operative distinction between multithreading and async multithreading seems to be the number of shovels relative to the number of workers. There is no reason why one couldn't have more than 5 threads on a 5 core system, or exactly 5 tasks on a 5 thread async executor, so that's not really the relevant distinction to make. The main distinction between multithreading and async on a multicore system is who is doing the scheduling (assigning shovels to workers). It's like the difference between having a foreperson on site managing the distribution of shovels to workers vs each worker driving all the way back to the shovel storage warehouse to return and pick up a shovel each time. The scheduling of threads is done by the operating system (the shovel storage warehouse), while async tasks are scheduled by an async executor (the on site supervisor). In some cases having an on site supervisor is more efficient but whether it is worth the cost of hiring such a person really depends on how often your workers are swapping shovels.

5 Likes

So, my example is not wrong, but it is incomplete.
Multithreading can get benefits from async because swaping thread at OS level is more expensive than application level, but not as much as code who needs to wait for something. And benefit for multithreading may not be good enough to use async.
But this means, main async users are code who needs to wait, like web and net. And others just may be useful, but needs more thinking if it is worth efforts.

Erlang is described accurately in those terms ("pre-emptive multitasking"). Surprisingly hard to find a first-party architectural document with the interesting particulars though, but here's a few initial resources:

From loose memory, a particular green thread gets deemphasized/unscheduled once it has performed a certain number of computations described as "reductions" and each loosely comparable to a single function call. Then it goes back into the queue, waiting its turn for another go at one of the OS-level scheduler threads. Most IO and some other kinds of computation such as NIF (FFI) calls may happen on "dirty" scheduler threads instead.

The model is why a BEAM application can withstand surprising amounts of traffic and still eventually respond without failing as long as it doesn't OOM, it just starts degrading in any measurements of single-request throughput.

Go is, to my understanding, using a similar system where it checks if it can preempt a goroutine at compiler inserted safe points, but you can still get it to lock up with pathological code (tight no-op loops), so it's a bit of a definitional question if it is ok to call it preemptive. I wouldn't be surprised if Erlang is a lot more reliable here though!

1 Like

While this may be true in general, you are unlikely to feel the OS thread context switching with few concurrent tasks. (Where "few" is somewhere in the thousands and most threads spend most of their time waiting on I/O.) It is not exactly easy to measure the overhead of a context switch, but some recent observations suggest that it is typically 2 microseconds. Let's be conservative and say it's 3 μs.

It might be that this 3 μs overhead is unacceptable to your application, for a common web server this seems silly. Your client requests are going to have much higher network latency than that, and your DB latency is also going to dwarf context switching overhead. If 1,000 threads all try to hit your DB at the same time, the cost of context switching between them is somewhere around 3 ms in aggregate (divide by number of CPU cores to estimate the per-core overhead). At 100,000 threads, this cost starts to become a concern - 1/3 of your process time on a single core CPU is spent on context switching alone.

It would also be nice to look into what the Tokio executor overhead is for napkin math comparison. This repo claims it is about 500 ns [1], which is a cool order of magnitude lower than OS threads. Switching between 100,000 tasks would have about 50 ms overhead, if this figure is accurate.

I'm lucky if my website has 2 concurrent users. At my previous job, they had thousands of customers each with thousands of clients and we had to maintain multitenant HTTP services for all of them. I can't give specific numbers, but client concurrency was 7-10 digits steady, and we had to take context switching seriously. Where your app fits on this scale is a good indicator of whether the complexity cost of async outweighs the overhead cost of OS threads. All of this data is here just to put numbers behind my previous "tens of thousands" claim.


  1. Beware there is a typo that says "TLDR is that it is at most 50ns" when in fact the numbers given further down in the README have a lower bound of 401ns. ↩︎

3 Likes

OS schedulers do not work like that. They usually work with so called "time slices", on Linux it's usually between 10 ms and 100 ms. So let's take the worse case scenario of 10 ms and assume that a thread does not do anything like blocking IO or synchronization (they can cause thread to release its time slice to OS earlier). So overhead of context switch using your value of 3 us is measly 0.03%! When most of threads are part of the same process, you do not need to flush TLB, so switching between them is relatively cheap.

Of course, in practice it's a bit more complex. You have OS scheduler overhead (it's more complex than "dumb" queue-based schedulers used in executors), you have more cache misses (threads often get stopped middle computation), threads can migrate unnecessarily between cores, with io-uring you can significantly reduce number of syscalls used, etc. But even with those factors, people often grossly overestimate overhead of thread-based code. And, as you correctly note, most systems stay significantly underutilized. It's the fact of life on which VPS providers build their business!

2 Likes

Don't worry, I'm aware. The litmus test was "If 1,000 threads all try to hit your DB at the same time..." which presupposes that waiting on I/O will perform a context switch. A worst-case scenario where N threads context switch roughly simultaneously.

IIRC:

  1. Erlang has no looping construct (for, while)

  2. Erlang can only loop via recursive calls.

  3. At every function call, the VM checks: should we pre-empt this Erlang process ?

1 Like

While it may behave as almost preemptive, it's technically not, as it contains, as you said yourself, compiler-inserted suspension points.This is totally fine, it just implements different multitasking model. Rust can use stackful too, I think I stumbled upon 1-2 crates, although it was very long time ago.
AFAIU Rust wanted async usable in no-std and no-alloc situations too, which makes stackful a no-go.

1 Like

I really do not think tokio and async is that difficult, assuming you have a decent grasp of Rust's core concepts ( borrowing, ownership etc ). You do have to learn a handful of things, but not much! If I can manage to use it, I reckon almost anyone can!

5 Likes

Yeah, what I meant is it depends on what you mean by "preemptive", as that term originally came from being the opposite of the at the time standard "cooperative" threading model where you would insert explicit calls to a yield API. (In the 16 bit Windows 3.1 days, for example)

If your runtime environment knows where safe points are to stop you without you having to do anything, that's clearly not a cooperative model! But if you can manage to lock it up if you know what's going on, it's not quite preemptive in the sense that we find the difference actually useful...

1 Like

Everyone seems to focus on the performance aspect. I've written an async library (me being able to do that counts as a success story for Rust, imho :)), and did so because async felt like the right solution to the problem. I could have done so with threads directly for sure, but it felt pretty clunky to me, while getting the needed behavior in async felt straight.

I'm not an expert for sure, but for me, rust async feels pretty good (modulo existing warts for sure :)).

11 Likes

If I remember correctly you can get it to lock up with NIFs, but not pure Erlang code.

NIFs are loadable extensions written in C (or Rust these days!), and i belive they by default run on a separate thread pool, but you can set a flag that they are "fast" in which case you are on your own if they are not.