For example: C10k problem - Wikipedia
That explains my personal bug-bear: why so many async libraries effectively tie me to tokio as an executor. It's so bad that my first criteria when looking for a dependency is "is tokio in the deps list"? If I end up with no options then I seriously consider "write it myself" vs tie myself to tokio.
It's a shame: when creating a binary tokio has a wonderful executor for more serious tasks. But if I'm creating a lib (even one for my own binary) I DO NOT want to colour my functions by executor. I'd consider this colouring much more problematic than the async/sync function colouring people usually complain about.
To any lib developers out there: please use futures-rs (& related crates) for your fundamentals, they work on any executor without hassle. Let your upstream decide which executor to use.
Most recently an app to monitor network traffic and
Most recently, an app to monitor network traffic. "Essential" - well, I'm sure I could manage the whole thing with separate threads but async actually makes this case easier.
The default answer to that is C10K. In fact, if you search around you can find quite a few posts from proponents of async who argue "If you're not running into C10K, then don't use async.".
While there's some truth to that, I think it's more nuanced -- I think it depends on how one likes to structure one's code. I think there are arguments for async for non-C10K applications as well.
I want to preface the rest of this post by saying that I'm a huge proponent of not forcing async on everyone -- I think one ought to strive to support both async and non-async versions, if it's feasible. I definitely understand the frustration of those who do not want to use async who keep running into crates that do something they need, but only in async. I personally put quite a bit of effort into supporting both non-async and async in my own crates. And while I'm going to make the argument for async in this post, in my daily life as a developer I'm far less tied to async than this post will make it seem. But I'm going to do something wild and crazy: I'm going to make an argument for async without C10K.
To me, the greatest hero of async is select!{} (which, ironically, isn't an inherit async property -- this is something that, afaict, third party runtimes came up with). Unfortunately, select!{} easily causes potential footguns. But hey, heroes need flaws to be interesting, right?
I used to use libevent a lot, and I loved it apart from that it just supported the event types that it was hard-wired to support. I often wished that I could add arbitrary event types to its event loop. And that's basically what async (+ runtime) does -- it gives you an event loop with arbitrary (read: application/library defineable) event types.
This is the sort of thing where I think async excels:
loop {
select! {
cmd = rx_svc.recv() => {
// Received a command from the service subsystem
match cmd {
Cmd::Terminate => break Quit,
_ => { }
}
}
frm = frmio.next() => {
// Received a frame from the client
process_client_frame(frm);
}
_ = timeout.wait() => {
// Timed out
break Idle;
}
}
}
Arbitrary event types and select!{} allow you to time out on "blocking" operations that do not support timeouts. In fact, you can support a single timeout on multiple non-timeout-supporting "blocking" operations. If you're in the habit of wanting to do these things a lot, async can be very helpful, in particular if you want to insert a bunch of other event types.
The point isn't that this can't be done without async -- it most certainly can, the point is that either you end up with something more limited (like with libevent, it doesn't support arbitrary event types), or you end up basically reinventing Futures and runtimes -- and at some point you might as well just use async.
One of the common criticisms of async is that it is "viral". I used to dislike async for this exact reason, but then I discovered channels that support mixing async and non-async end-points and I ended up just using message passing, which led to async no longer encroaching on my non-async code. (Message passing has other, to me, benefits, so I got less async virality and much more robust code in one fell swoop). But as I wrote near the top: Whether you can benefit from this depends on how you structure your code. I happen to like message passing a lot, which just happens to align very well with mixing non-async and async.
Most of my networked applications use async primarily for the networking bits, and the majority of the code is non-async with just a few channels to bridge between them -- getting the best of both worlds.
So to answer your specific question: async isn't essential, in the same way high-level languages aren't essential. You can do everything you need to do with low-level languages -- but sometimes it's nice with some pre-baked luxuries.
(Not saying I think you should give async a try, just explaining why I use it, and why I think it has uses beyond C10K).
This post on Structured Concurrency (part 2) (and the contained link to part 1) helped me "see the light" so to speak as far as not just spawning tasks left and right. It still matters what the lower layers do (sleeping via tokio, network via tokio) but if your library can be abstracted to not need those then you can let the consumer decide how to orchestrate your returned futures.
Thank you, yes, it's solved using the Selector or virtual threads in Java. I didn't face the problem directly in Rust yet, although I have implemented websockets.
Generally one thread with something like let shared_data = Arc::new(Mutex::new(VecDeque::from(["first".to_string()]))); should also address that. So, as I understood, async it's just a convenience matter.
Thank you everyone for helping to understand the use cases.
That is not the crux of the C10k problem. It was about performance under certain circumstances.
It's a long time since the original C10k problem was published but from what I recall it arose from the rapidly growing number of web users at the time.
The original web servers would spawn a whole new process for every new client connection. When you got to hundreds of users that was taking a lot of memory (much more limited at the time) for thread stacks and wasting time scheduling processes.
Later, web servers (think Apache) started using threads in a single process per client connection. This was much lighter weight in memory and CPU usage and could support thousands of simultaneous connections.
So the question was: What happens when we have 10,000 simultaneous connections or more?
Now it happens that in this scenario each process or later thread spent most of its time doing nothing. Typically accepting a request, tweaking with it an then making requests to databases, waiting for the results, tweaking that and returning it to the client. Despite the little work they did one was still limited by memory and perhaps context switching overheads. Every thread needs a stack space.
The suggestion was then to go async, handles as many connections in a single thread as possible. Get rid of all those stacks and context switches.
Bottom line is that async was put up as the solution for situations where most of your jobs spend most of their time waiting for IO. Then you can efficiently handle a lot more jobs simultaneously.
That was the async use case.
Now, it seems to me that for many other things async is not necessary, like when parallelising a big compute job. If your treads/tasks are doing a lot of work on multiple cores then async is not saving you anything.
My rule of thumb is:
Sync is for when you have a lot of work to do.
Async is for when you have a lot of waiting to do.
Of course things are not so cut and dried:
Now a days we have loads of cores to play with and things like Tokio will make use of them.
Perhaps there are some conveniences to using async rather than threads.
Ah this is a topic which I really has some, well, I would say, pain and craziness on it.
About several days again we released a crazy project around this, and just now I saw this topic, so it might be great to leave some comment of mine here.
But, well, some of my thoughts are really crazy, and if you do not like them, you could just forget them...
First and foremost, for most of the history of computer science, the async 'viral' problem mainly contains 4 part:
- Language bound (because actually we have libco in c, but it is far from high performance or easy to use for other language)
- the color problem between sync code and async code
- the executor bound problem (in the root, this is down to the fact that all of our operating system infrastructure are designed as sync at first, and we have to gradually adapt async api or manual implement async apis in our executor)
- the gap between stackfull async and stackless async model
It is really crazy when we first think about solving these 4 problems at the same time, but finally we figure out that we could design a "traditional" c style async api that fit in all these. Providing cffi solves problem 1; stackfull approach solves problem 2; we accept running a standard future from rust, so it solves problem 3; we provide a pre-allocated stack and extensible heap for running future, which solves problem 4.
And it is really crazy. But we also faces much more problems which I would say in the late part of this post.
Yes that is surely right. But the TLS problem is annoying here---but if you just pull up a tokio executor or just use a listener, it might be ok in most cases. But if you need to poll up a heavy listener/executor each time, it is performance damaging; but if you don't, the ecosystem is a problem.
Well, in our project, we, really crazy, uses coordinative style scheduler, which based on a big P2P network of space networks between workers, and a big warehouse for emergency needs. And after testing, we find that just like in statistical physics, letting the task to decide where they are deflecting to is much performance efficient and about linear extensible on high core count CPUs. And the work deflection are also proved to be somehow more efficient at reaching relavent "fairness" the the work stealing method.
But the problem: we cannot easily trace or cancel it.
Just like, again, in statistical physics, we cannot the trace of each "particles" --- that is to say we actually do not really now where the tasks are, and, cannot trace it or cancel it easily. This is another issue.
Yes that is my advice to common libs devs, too.
That is true...
Message passing is really great, but sometimes, well, it is more destressing to write things like it. And more importantly if the signal system becomes too complicated, it is much more likely to have race condition bugs---although we have loom it is not perfect anyway. And message passing sometimes causes CAS storms between cores. Our crazy work also uses a somehow, more complicated then originally thought, signal system.
But anyway design is always the first thing to consider.
And finally another several note on the other problems our crazy move:
- Uses too much asm and unsafe, MIRI do not work there;
- Even asan stops working (we could manual mark the areas where there shall be no runtime marks, but it is complicated anyway). So we have to use much more e2e tests and fuzz.
- It is high performance, spawn performance and hot core work deflection several times faster then tokio even on a GitHub ci machine, and achieves almost-linear performance scaling per core on high CPU core count chips. But traceability is a issue anyway. So only guys who really need that high performance, or our c programmer friends who need a better concurrent runtime, could have a try on our project or moves like us.
So in summary, it is possible for async without tokio, because async is just a signal systems. But it would be thought less ideal on pure safe rust or much more bounding.
Updates:
I do not want to make my reply like a marketing post, but anyway if you are interested in our approach for conversations on it, you could refer to GitHub - Apich-Organization/dtact: Dtact: The Universal Topology-Affinity Async Runtime · GitHub or Releasing dtact v0.2.2 and rssn-advanced v0.1.0
And to avoid being thought like that, I would recommend this project to be only used in performance sensitive projects. Normal web projects and so on should just use sync or tokio.
Coincidentally, when this thread started I was putting together a little async runtime of my own. Just to trying to understand how such things work. OK, I will come clean, my AI friend put it together for me, it looked like I would never figure out how to do it on my own, don't panic I won't post the code here.
Anyway, what I end up with a minimal async runtime in only 200 lines of code that:
Runs tasks.
Tasks can sleep() for a given number of milliseconds.
Tasks can communicate via channels (Their only I/O)
Sleeps the whole program when there is nothing to do (as opposed to busy waiting)
Is usable without the standard library (no-std)
That means it's a few steps away from being usable on single core micro-controllers, bare metal with no OS, only using interrupts to keep things moving.
All in all then, I'm glad Rust does not have an async run-time welded in.
Embassy
it give you Async with Static Execution
it is the "Async" that you mean if you want no tokio async
is really the answer isnt it?
Generally it's common practice - one thread for all 10K connections monitoring, and when something arrives, spawn a processing thread.
I'd even say that using async without either one of select!/join! is totally pointless. (Either can hide inside a framework, though.)
Spawning threads is relatively fast, but it doesn't scale. (Imagine spawning 10K threads to handle an avalanche of incoming HTTP requests -- the memory for the 10K stacks alone, not to mention the amount of shuffling the scheduler would need to do).
kqueue and epoll were invented to handle the C10K problem (well there were other reasons, but let's just say handing many concurrent connections was a motivating factor) -- and this is why things like libevent, libev, async/await, et al tend to use event loops rather than using per-connection threads.
It's worth looking into the architecture of thread pool based task schedulers; even without async syntax, being able to dispatch fine-grained tasks to a thread pool can be a really efficient approach, able to dispatch tens or hundreds of millions of tasks a second per thread with very low overhead in theory (tokio can't reach that for spawned tasks because it needs to allocate to type erase tasks, unfortunately, but awaiting nested futures is much cheaper. I couldn't find a good benchmark for that already, though...)
Async syntax is really "just" the sugar to make it reasonable to have all the work broken up that fine grained without completely destroying the readability. There's ancillary benefits, and lots of sharp edges since OS APIs haven't quite caught up yet, but it really is just a way to pass "the rest of the function after here" to a well proven, performant and powerful design that really wants "the code to run after this finishes" as an argument.
Unfortunately no one system can service a load above own capacity, it's why DoS attacks are possible and they really work, we can even see them during Black Friday and other similar events.
It's true. My latest web server has the problem for servicing websocket requests, so I hope you will teach me how to get over the problem. Here is URL just in case.
If you don't use these stacks then it's just an address space. If you use them then you would use them in async, too.
That was solved decades ago. Sadly it requires changes to OS kernel and people would rather build Goldberg machines like async than go and change their OS kernel…
It's the same reason for why QUIC is layered on top of UDP: in theory TCP/IP should support something like QUIC natively, in practice you have to deal with idiocy of network equipment developers thus Goldberg machines is the way to do.
Same with async: in a sane world it would have been used exclusively where it makes sense — as a replacement for threads noth together with threads… but we don't live in a sane world.
I built some protection against DoS attacks in my web server, it monitors the number of requests from an IP (or logged on user), and also the socket read/write bytes, and has a budget for each. I don't know how effective it would be against a real attack as I never saw one. It is written with Tokio.
True. People migrated from cooperative multitasking (Win3.11, classic Mac OS) to preemptive (timer-based) one, and then... migrated back as every single language has async/await nowadays. And they also introduced function-coloring. The only exception here is Go as they replace blocking functions with non-blocking in goroutines afair.
..but we have NAT and NAT uses TCP or UDP ports so we have to use UDP or patch every single router on the Internet.
This world is full of strange solutions: People send Base64-encoded binary data via JSON via REST via HTTP instead of bare TCP. Or they invent a scripting language without static types and then create linters to check types. They implement GC and then explicitly close resources as GC doesn't have "drop"/descrutors. And so on..
No need to imagine. The following code works fine on my computer without even needing to adjust ulimits:
use std::{
thread::{sleep, spawn},
time::Duration};
fn main() {
let mut threads = Vec::new();
for i in 0..10000 {
threads.push(spawn(move ||{
sleep(Duration::from_secs(1));
i
}));
}
let mut res = Vec::new();
for t in threads {
res.push(t.join().unwrap());
}
println!("{:?}", res);
}
The C10k problem is over 20 years old now; the state of the art has changed by a few orders of magnitude since then.
The real reason nobody uses that solution is that Google never released it. I suppose somebody could try to reproduce it, but most OSS people lack something like Google's infrastructure to test on (and "we're still working on this" tends to have a dampening effect on that kind of effort springing up).