Minimal Rocket app generates probably enormous amount of heap allocations

Hey Rust lang community,

I'm playing around with Rocket web framework. I need a really simple yet fast HTTP application that puts some data from HTTP requests into Redis.

To achieve this I took Rocket (as it looks easy to start with), deadpool for handling connections to Redis and serde_json for serializing data from HTTP to store it further as a message in the Redis Stream.

Once I started to benchmark my few-lines application using wrk I noticed that the application do not run fast enough. On my Ubuntu laptop the release build serves only 60K requests/sec. When I threw away all my code and kept a basic "hello world" GET-responder the app gave somewhat 80K requests/sec.

What I desire to achieve is nearly 120-150K requests/sec served. So, I need a nearly 2x increase in throughput. I tried to tune here and there, a number of workers for instance, and I quickly realized that I do not evaluate bottlenecks of my app properly. Digging even further I have discovered that the count of heap allocations looks abnormal. Valgrind shows 1.9M allocations after a wrk run on 23K requests. So, it is 82 allocation per request. Does it seem to be reasonable for app that handles TCP connections, deserializes HTTP headers, then serialize them and send over next wire?

Thank you for reading until this line. I am asking about suggestion on what to focus on in profiling and improving the application performance. Should I change the web-framework this on warp or axum? How to properly search for bottleneck in this simple app but the complicated environment? Could you suggest an improvement from your past experience with Rust and web-frameworks?

Define ā€œreasonableā€. I remember time when my friend had trouble with performance of his Android app (in early days when it was Harmony/Dalvik based) and have found out that one, single, printf in his code created more than 200 temporary objects.

I would say that 82 allocations per request is more-than-adequate for convenience-based framework but if you are planning to achieve speed-records then you, of course, need something else.

That's most definitely achievable (Cloudfare replaced NGINX with Rust-based Pingora for a reason), but I wouldn't expect to see number like that with general-purpose web-framework.

These to be oriented more toward ā€œgood performance, great ergonomicsā€ rather than toward the ā€œsmallest overhead possibleā€.

The general expectation is that your pages take some ā€œreasonable timeā€ to be created, not that you just serve mostly-static content.

From my experience most web frameworks are not geared toward these extreme numbers. I suspect Actix-Web may achieve 120-150K, not sure about Axum.

But web-servers geared toward static content serving, like NGINX, may do 10x of thatā€¦ they are just not very usable as general-purpose frameworks.

Thank you, @khimru for pointing out to the post about Pingora.
I decided to play with lower level solutions. Hyper looks good enough. Will update on my challenge if anybody interested.

1 Like

To keep discussion more concrete, I uploaded my code and instructions required to reproduce my results to GitHub.

Should I change the web-framework this on warp or axum?

You might want to take a look at the techempower benchmarks to get an idea how minimal webserver benchmarks perform in comparison. The sources are available here.

I quickly realized that I do not evaluate bottlenecks of my app properly. Digging even further I have discovered that the count of heap allocations looks abnormal. Valgrind shows 1.9M allocations after a wrk run on 23K requests.

But did a profiler show a lot of time spent in the allocator? Allocation counts only matter if the allocations are your bottleneck. cargo flamegraph or samply should do the job for starters, there are more advanced options if necessary.

If it is indeed allocations the next thing you can try is replacing the default (system) allocator with another one, e.g. jemalloc.

serde_json for serializing data from HTTP to store it further as a message in the Redis Stream.

JSON is not very efficient. For example there there's an impedance mismatch between String and what JSON considers a string, so one can't just borrow the bytes from the underling buffer when deserializing.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.