rayon is pretty much a one-stop-shop for parallelising cpu-bound work, and it's a mature offering. I think you'll get the most out of it if you can structure your tasks as a stream of independent events and use parallel iterators. But it does also expose a threadpool you can send individual units of work to as well.
If your event loops are tracking progress through futures then it might also make sense to look at futures-cpupool. I think futures support in rayon is still experimental. Futures should scale very nicely for a large number of tasks.
Are the tasks independent of each other? Or do you have tasks that themselves spawn more tasks? Rayon is more of a fork-join parallelism library, and it has some parallel adapters (e.g. a parallel iterator).
A more conventional threadpool option might be futures_cpupool - Rust. This is a classic threadpool: you submit a task (closure) and you get a future back representing the async execution of the task.
Millions of (small) tasks per second will stress test the mechanics of a threadpool, namely the overhead of concurrency primitives used internally. Rayon has the advantage of using work stealing off thread local work queues, but that works well for fork-join more so than many independent tasks submitted from the outside. I think I would try the futures cpupool, but be prepared to tinker with the design at that rate of task arrival.
If you end up benchmarking rayon, futures cpupool, and whatever else, please report back your findings - I think folks will be interested in seeing that.
I ended up using futures-cpupool and the trick is to .forget() the future, otherwise it stays lazy and doesn't execute in parallel (well, if you join them @ the end they will but I had tasks that I wanted to fire off immediately and didn't need the result) ... Futures-pool is cool because it allows to pass closures into it that effectively parametrize the 'static Fn which is very hard to program in Rust yourself I found ...
Vitaly, I'm not using it @ mill/sec rate or anything like this. Basically external API backend processing for the moment that won't generate high rates. Otherwise for parallelization I use scoped threads and for fine-grain I like rayon iters. I understand people like futures but with that I loose the "locality of parallelization" and my maintainance becomes harder over time (if that makes sense).
Ah, ok - your original post talked about millions of events/sec that I (perhaps mistakenly) assumed you were going to submit to a pool. Thanks for clarifying.
Looking at the current implementation of futures-cpupool, I would have been highly surprised to see it withstand that much load. It has clearly not been tuned for task scheduling performance yet (most likely because no one has needed that to be fast so far).