What I Learnt from Benchmarking Http4k, Ktor (Kotlin) and Actix v2, v3 (Rust) Microservices

strohel · September 14, 2020, 11:09am

Posting here in case somebody can explain the fine nuances of Actix CPU efficiency and per-request memory usage as number of concurrent requests rises. (or for any other thoughts - I hope this is fine on URLO)

uberjay · September 14, 2020, 7:16pm

For the CPU efficiency question, is it possible CPU frequency scaling is at play? No clever ideas for the per-request memory usage behavior, but it's pretty interesting.

ZiCog · September 14, 2020, 8:00pm

So much detail there I don't know what to make of it.

I notice though that your commentary and graphs etc go to 1024 simultaneous connections.

In my naivety this seems like small fry. The problem of tackling 10,000 connections was already a thing in 1999. C10k problem - Wikipedia. When we had much smaller, slower machines.

That C10k problem is the reason why Rust has invested so much effort into async programming.

So, in my naivety, it looks like you are two or three orders of magnitude short of pushing what we expect of modern machines to the limit.

Or what am I missing here?

strohel · September 14, 2020, 10:51pm

Good point. CPU frequency scaling (or boosting) is indeed employed even on CGP:

N2D machine types run on AMD EPYC Rome processors with a base frequency of 2.25 GHz, an effective frequency of 2.7 GHz, and a max boost frequency of 3.3 GHz.

If I turn my head correctly about it, frequency boosting should have the following effect: when the load is low (that is 1 to 4 connections where actix does not yet saturate a single CPU core), cpu time milliseconds per request should be lower, as CPU could boost frequency and actually execute more ticks per millisecond.

However, on the graph we see an opposite effect, cpu time per request is highest for 1 parallel connection. A mystery to me.

strohel · September 14, 2020, 11:11pm

The "C10k problem" is solved at a different level in the microservice architecture: the actual microservice instances are shielded by a load balancer (with potentially dedicated hardware), which distributes incoming requests to ever changing number of microservice instances. Load balancers also usually aggregate a lot of client connections into a single/few connections to the microservice. When the load is higher, number of instances can be raised (this is called scaling horizontally). The metric to optimise in microservices is thus requests per second, or rather requests per second per a unit of available resources.

Also note that the tested service deliberately has access only to 1.5 of CPU cores, which is much less than usual modern machine. Actix TechEmpower benchmarks yield e.g. 650,000 requests/s (at 512 connections) per 28 hyperthreads = ~23,000 req/s per core, comparable to my benchmark's ~7,300 req/s per core.

Hyeonu · September 15, 2020, 1:33am

Pentium Dual Core brand appeared in 2007, so in 1999 I'm pretty sure most of us lives with the single core machines. And it serves about 10K concurrent connections pretty well on single machine without load balancer. Concurrent connection means the number of in-processing connections at single instant. Most HTTP connections don't last more than a second so it would be far less than the req/s number.

ZiCog · September 15, 2020, 7:56am

Sorry yes, I have started to talk about something somewhat different but related to your theme.

To a first approximation I interpret the C10k problem as to be asking how do we even maintain a thousand/million connections on a single machine and wait on input from them at all? Never mind doing any actual work to fulfill the requests. That's before start measuring performance in requests/second or whatever.

Previously it was common to fire up threads or even processes to handle connections and have them waiting on blocking I/O. This does not scale to thousands/millions of connections what with all the memory consumed by even starting a thread and all the context switching going on.

Which I read is the whole motivation for using node.js and the event driven model of Javascript and now async/await in Rust.

system · December 14, 2020, 7:56am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Getting tokio to match actix-web performance help	13	9775	January 12, 2023
Best framework for complex cpu bound web api service	17	1341	May 9, 2022
Actix Web request failure rate all the way to 30% help	13	526	September 21, 2024
Why custom made Tokio HTTP server is slower than framework based like Axum, Actix, Ntex and so on?	14	295	August 11, 2025
Json webService with actix help	2	490	May 13, 2020

What I Learnt from Benchmarking Http4k, Ktor (Kotlin) and Actix v2, v3 (Rust) Microservices

Related topics