How to optimize the spawning of too many tasks in Tokio?

I have a RPC service, it will spawn lots of tasks(about 100) doing jobs when request come(about 2000qps). I perfed it and found cpu cost much in tokio::runtime::task::list::OwnedTasks<S>::bind_inner and tokio::runtime::scheduler::multi_thread::worker::<impl tokio::runtime::task::Schedule for alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle>>::release.

Any idea to optimize this? What I can think is maybe make a task pool to reuse?

Clarification: You are spawning 100 tasks per request?

It sounds like you have too many tasks, so that the overhead of task management is greater than the work each task is doing. You should try to make fewer tasks which each do more of the work. If it's not clear how to do that, show us your code.

3 Likes

Yes, 100 tasks per request.

I think you are right. The task spawned is doing much less work and mostly sync code. But I want the maximum parallel, should I use the rayon thread pool to do this? If do so, I need to find a way to communicate between sync and async.

Is this program intended to run on CPU with ~100 cores? Also, is it important to spin all the cores even with single request?

Keep in mind that synchronization itself can be quite costly, often more costly than the actual logic. Sometimes make things parallel can slow the job down in wall clock time by involving more synchronization work than parallelization gain.

3 Likes

No, only 8 cores maybe. I'm thinking it's maybe meaningless to spin all the cores. I will make it sequential and perf again.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.