I'm playing around with Rocket web framework. I need a really simple yet fast HTTP application that puts some data from HTTP requests into Redis.
To achieve this I took Rocket (as it looks easy to start with), deadpool for handling connections to Redis and serde_json for serializing data from HTTP to store it further as a message in the Redis Stream.
Once I started to benchmark my few-lines application using wrk I noticed that the application do not run fast enough. On my Ubuntu laptop the release build serves only 60K requests/sec. When I threw away all my code and kept a basic "hello world" GET-responder the app gave somewhat 80K requests/sec.
What I desire to achieve is nearly 120-150K requests/sec served. So, I need a nearly 2x increase in throughput. I tried to tune here and there, a number of workers for instance, and I quickly realized that I do not evaluate bottlenecks of my app properly. Digging even further I have discovered that the count of heap allocations looks abnormal. Valgrind shows 1.9M allocations after a wrk run on 23K requests. So, it is 82 allocation per request. Does it seem to be reasonable for app that handles TCP connections, deserializes HTTP headers, then serialize them and send over next wire?
Thank you for reading until this line. I am asking about suggestion on what to focus on in profiling and improving the application performance. Should I change the web-framework this on warp or axum? How to properly search for bottleneck in this simple app but the complicated environment? Could you suggest an improvement from your past experience with Rust and web-frameworks?
Define āreasonableā. I remember time when my friend had trouble with performance of his Android app (in early days when it was Harmony/Dalvik based) and have found out that one, single, printf in his code created more than 200 temporary objects.
I would say that 82 allocations per request is more-than-adequate for convenience-based framework but if you are planning to achieve speed-records then you, of course, need something else.
That's most definitely achievable (Cloudfare replaced NGINX with Rust-based Pingora for a reason), but I wouldn't expect to see number like that with general-purpose web-framework.
These to be oriented more toward āgood performance, great ergonomicsā rather than toward the āsmallest overhead possibleā.
The general expectation is that your pages take some āreasonable timeā to be created, not that you just serve mostly-static content.
From my experience most web frameworks are not geared toward these extreme numbers. I suspect Actix-Web may achieve 120-150K, not sure about Axum.
But web-servers geared toward static content serving, like NGINX, may do 10x of that⦠they are just not very usable as general-purpose frameworks.
Thank you, @khimru for pointing out to the post about Pingora.
I decided to play with lower level solutions. Hyper looks good enough. Will update on my challenge if anybody interested.
I quickly realized that I do not evaluate bottlenecks of my app properly. Digging even further I have discovered that the count of heap allocations looks abnormal. Valgrind shows 1.9M allocations after a wrk run on 23K requests.
But did a profiler show a lot of time spent in the allocator? Allocation counts only matter if the allocations are your bottleneck. cargo flamegraph or samply should do the job for starters, there are more advanced options if necessary.
If it is indeed allocations the next thing you can try is replacing the default (system) allocator with another one, e.g. jemalloc.
serde_json for serializing data from HTTP to store it further as a message in the Redis Stream.
JSON is not very efficient. For example there there's an impedance mismatch between String and what JSON considers a string, so one can't just borrow the bytes from the underling buffer when deserializing.