My benchmark done Elixir is Faster than Rust+Tokio, how do i improve rust code?

scenario 1.
in this scenario i used message passing and a calculate operation
first spawn some worker ( number == my cpu core)
then sending request to each worker and await for result of each worker and it do simple sum over N counter and then response Result::Ok in rust

Rust+Tokio: 61,942 ~ 79,300 (microseconds)
Elixir/Beam: 23,393 microseconds

after set Rust compiler opt-level = 3

Rust + Tokio improved performance
from: 144,792 ns
to: 60,000 ~ 70,000 ns

Rust Server

pub mod server {

    use tokio::sync::mpsc::{self};
    use tokio::sync::oneshot::{self};

    pub struct Request<MReq, MResp> {
        pub msg : MReq,
        pub resp: oneshot::Sender<MResp>
    }

    pub enum MReq {
        Event(i32)
    }
    pub enum MResp {
        Event(Result<(), ()>)
    }



    pub async fn start() -> mpsc::Sender<Request<MReq, MResp>> {
        let (client, mut server) = 
            mpsc::channel::<Request<MReq, MResp>>(16);

        tokio::spawn(async move {
            while let Some(req) = server.recv().await {
                let MReq::Event(n)  = req.msg;
                {
                    let mut temp = 1;
                    for num in 1..n {
                        temp += num;
                    }
                }
                let _ = req.resp.send(MResp::Event(Ok(())));
            }
        });
        
        client
    }
    
}

Rust Client


use chrono::PreciseTime;


mod server;
use server::server::{ start, MReq, MResp, Request};
use tokio::{sync::oneshot::{self, Receiver}};


#[tokio::main]
async fn main() {

    let server1 = start().await;
    let server2 = start().await;
    let server3 = start().await;

    let start = PreciseTime::now();
    // ===================================================

    for n in 0..1000 {
        
        let (recv_resp1, req1) = request_factory(n);
        let (recv_resp2, req2) = request_factory(n);
        let (recv_resp3, req3) = request_factory(n);
        
        
        let _ = server1.send(req1).await;
        let _ = recv_resp1.await;

        let _ = server2.send(req2).await;
        let _ = recv_resp2.await;
        
        let _ = server3.send(req3).await;
        let _ = recv_resp3.await;



    }

    // ===================================================
    let end = PreciseTime::now();
    let tm = start.to(end).num_microseconds().unwrap();
    println!("==> {} ns (microseconds)", tm) 

}


fn request_factory(n: i32) -> (Receiver<MResp>, Request<MReq, MResp>) {
    let (resp, recv) = oneshot::channel::<MResp>();
    let req = Request::<MReq, MResp> {
        msg: MReq::Event(n),
        resp
    };

    (recv, req)
}

my Elixir code is normal code, i dont use any tricks for increasing
performance just use GenServer

Your Rust code is incomplete.

sorry, completed now

actually i think, its for optimization in Elixir (beam),

because beam use mailbox per process
and it dont use a something like oneshot for wait on response,
it just send and receive to mailbox and recently done more optimization for receiving from mailbox

What if you do this?

        let _ = server1.send(req1).await;
        let _ = server2.send(req2).await;
        let _ = server3.send(req3).await;
        let _ = recv_resp1.await;
        let _ = recv_resp2.await;
        let _ = recv_resp3.await;

It seems like you're effectively benchmarking the memory allocator. By default Rust uses system libc's malloc, which may not be the fastest allocator implementation. Try with jemalloc or mimalloc instead.

3 Likes

because in Elixir work like this, and just wanted everything work same and see result of performance.

Elixir after send a request::call blocking process and it wait on response

What happens if you use #[tokio::main(flavor = "current_thread")] here?

I tested it right now,
its great

from: 60,000
to: 2,000

but actually,
I choose Rust for concurrency because i working on
distributed real time database

and when i heard about Rust+Tokio concurrency
i migrated from Erlang/Beam to Rust

Threads are fast at executing independent tasks, but communication between threads is slow. The more messages you need to send back and forth between different threads, the less benefit you get from threads. Your test case where all that happens is messages between threads is pretty much the worst possible case for threading.

I see you opened an issue on our issue tracker with the same question. It's going to be the same people who answer there and here.

2 Likes

Sorry I did not know this discussion would continue here because it was not very related to topic.

even when i used Tonic in there recommended use single-thread
and exist benchmark for not good result on rt-multithread scheduler

this

from above link :

  • Rust implementation provides best latency and memory consumption for a 1 CPU constrained service. It makes it a great candidate for services that are supposed to horizontally scale. On the other hand, scaled vertically it does not perform good

Benchmark Result :

  1. Single Thread ==> ~ 50,000
  2. Multi Thread ==> ~63,000

It is certainly possible to get good multi-threaded performance with Tokio. If the default setup doesn't perform as well as you hope, there are some tricks you can try. One of the simpler tricks is to move the accept loop into a tokio::spawn rather than doing it directly in main, which sometimes results in better performance. Another is to spawn multiple single-threaded runtimes.

You were right,

i wrote a benchmark for 1_000 client connection
each call one rpc in Tonic

in multi thread scheduler,
latency was better than single-thread

latency single-thread Tonic : under ~130ms
latency multi-thread Tonic: under ~100ms

again thanks :pray: :pray: :pray:

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.