Super super fast API

Hi there,

One of my friends started an online challenge, where participant submit their answers as a public git repository.
the challenge is to create two simple endpoints:
A. curl -X POST http://localhost -d 'some number'
B. curl http://localhost/count
where in A, we receive a number and we add it to previous numbers and in B we return that sum.
for example if we call API in the following order:
curl -X POST http://localhost -d '5'
curl -X POST http://localhost -d '6'
then http://localhost/count endpoint should return 11
I don't know Rust very well, but I used Rust and Tokio to create an API:
https://github.com/MahdiZareie/soallpeach/blob/master/countme/src/main.rs
What do you offer to improve the response time?
You might like to see other competitors code, these are the top 3:
https://github.com/Rmaan/soallpeach/blob/master/countme/countme.go
https://github.com/omid/SoallPeach/blob/master/countme/src/main.rs
https://github.com/thebrodmann/soallpeach/tree/master/countme <-- haskel I guess

Don't use regex, try u32::from_str instead, and avoid allocating memory (prefer slices instead of String/to_vec)

2 Likes

Before we do anything else, are you compiling this in release mode (i.e. cargo build --release)? The docs say to run the server with cargo run --example echo which is a debug build by default, and the code will be awfully slow because it won't have any optimisations (optimisations make debugging harder). I wouldn't be surprised if you get a 10-50x speed increase by just compiling in release mode.


This isn't necessarily related to performance, but your code is unsound and has a massive race condition. By using a static mutable variable for your counter every read and write to the variable will be done without any synchronisation, meaning multiple handlers that are running at the same time will "step on each other's toes", so to speak (see this StackOverflow answer for a more precise explanation). You'd only start to notice this when you do serious benchmarking and the tokio runtime starts handling requests on multiple threads.

You should be using an integer from the std::sync::atomic module because these manage thread safety correctly. That way you can create an AtomicUsize (or AtomicI32, or whatever) on the stack and pass references to it into your handler function.

Especially if you're new to the language, always try to write safe code and avoid the unsafe keyword. Usually when unsafe is involved there will be tricky invariants that need to be upheld (e.g. no data races or that you are using pointers correctly)... To be fair, if you wrote the same thing in C or Go it'd be equally as broken, you just wouldn't know it.

12 Likes

Very useful tips, Thank you very much both of you
I thought Tokio is a single threaded event loop, that's why I didn't use synchronization mechanism, I will change my implementation as you suggested and try again.

Tokio uses a work stealing multi-threaded scheduler by default.

2 Likes

Thanks Alice, I'll read about it

Thanks to your helps My code got a lot better. :man_dancing:
I'd like to know where is my bottleneck.
I'm working on this code for a few weeks, read a lot of HTTP RFC stuff and did a lot of experiment, but still it is not good enough.
Do you know any good tool that helps me profile my code in runtime?

At a first glance I see

let incoming = String::from(str::from_utf8(&data[0..size]).unwrap());
let is_get = incoming.starts_with("G");

which unnecessarily does a Unicode check and makes a copy, when you could just check

let is_get = data.len() > 0 && data[0] == b'G';

I should point out that this is an example of the earlier advice of @naim.

This next one won't speed your code, but just make it easier to read. You can replace

while match stream.peer_addr() {
        Ok(_) => true,
        _ => false
    } {

with

while stream.peer_addr().is_ok() {

The standard library has a bunch of these convenient accessors, many of which you can guess once you see the naming pattern.

2 Likes

I'm puzzled as to what the counter and loop are there for. It looks like you're reading the first body, and then counting how many other reads succeed and multiplying by that. Surely that isn't what you want to do? The code would be way simpler without the extra looping and breaking that shouldn't ever happen if I'm understanding it right.

1 Like

Thank you very much David @droundy, very good points, I'll apply them :+1:

Well, this is a micro-optimization I did based on how they run tests, the test-runner call the API many times concurrently but all the requests are the same, so I thought I can keep the first one and multiply it to the number of requests, and it worked.
Since the test-runner will use the same TCP connection to make all the HTTP requests, I added the loop, and it was a major improvement over my previous version where I close the connection after every HTTP response.

I am fairly new to rust, so take this with a grain of salt, but looking at the code it feels there is two things that could be happening (or I might be completely wrong):

  • You starve the CPU and hence there is just too many threads - though if those are scheduled asynchronously on Tokio this should not be happening. (But I am seeing no async in your functions ?, so why do you think you use Tokio?)

  • No new incoming requests can be served while the spawned threads execute

Basically you normally would schedule a number of workers asynchronously but that would mean to also have the listener be async, but as there is no await you have the listener synchronous.

Basically I think it might be worthwhile to see if async can improve the speed.

1 Like

Good points, actually I removed Tokio in my latest version and I'm not using asyncIO API either.
But you're right, it would perform much better if I use AsyncIO API with a work-stealing runtime like Tokio.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.