Require Help to understand my mistakes

I wrote a framework just to learn the language.

It is running on tokio runtime & tokio TCP.

On my system WRK benchmark results
Bare Tokio: 6.1k Req/Sec
My Framework: 5.2k Req/Sec
So there is around 15% drop.

So I tried bare hyper & bare warp benchmark on same system, because warp use hyper & tokio.

Hyper: 153k Req/Sec
Warp: 148.6k Req/Sec
So there is only around 5% drop.

So my question is how can I make my framework faster with 5% drop.
How can I make a much better application in rust?
Can anyone please point me out what are the mistakes & wrong things I am doing?

Thank you.

I'm slightly confused by your benchmark results because hyper uses tokio, and yet the "Hyper" benchmark has two orders of magnitude higher throughput than "Bare Tokio"? The linked repo does not appear to have any of the benchmark code, so I can't tell what you are even comparing.

Going over the code, the only thing that really stands out as maybe interesting from a performance perspective is that you seem to do a lot of potentially unnecessary allocations, especially wrt the Context, Request, and Response types. The middleware callbacks are probably not ideal for performance. If low overhead is critical, these might be the first areas to investigate. But ultimately, code review is no substitute for profiling.

1 Like

@parasyte Hi
Thanks for the response.
Let me clear.
Maybe I am making people confuse. Sorry about that.

I created bare tokio tcp listener which output 6.1k req/sec.
My framework runs top of tokio tcp listener which output 5.2k req/sec.

On other hand

bare hyper output is 153k req/sec
base warp output is 148.6k req/sec

I am not comparing my framework with hyper or warp. I am comparing with bare tokio only.

My point is. where my framework drop tokio tcp performance by 15%.
On other hand warp drop hyper performance by only 5%.

I am additionally making an observation that hyper uses tokio under the hood, so we should expect hyper and tokio to have comparable performance. This observation leads me to believe that whatever you are doing in the "Bare Tokio" benchmark is suboptimal, at best. If you fix that, you should see similar performance on that benchmark to what hyper gives you. Therefore, making the same fix in your framework benchmark will give you a solid 25x performance gain with minimal effort.

All that aside...

This is why I pointed out unnecessary allocations as a likely suspect for the 5% -> 15% delta. But, who knows! You're better off profiling it to identify the actual hotspots.

Also, it is kind of difficult to make this kind of comparison and infer any meaningful information. The 5% delta amounts to 15K rps, which is almost 3x the amount that either of the other two benchmarks do IN TOTAL. The 15% delta is a mere 1.1K rps! Immediately, the sample rate gets called into question.