My benchmark done Elixir is Faster than Rust+Tokio, how do i improve rust code?

DanyalMh · December 10, 2021, 11:20am

scenario 1.
in this scenario i used message passing and a calculate operation
first spawn some worker ( number == my cpu core)
then sending request to each worker and await for result of each worker and it do simple sum over N counter and then response Result::Ok in rust

Rust+Tokio: 61,942 ~ 79,300 (microseconds)
Elixir/Beam: 23,393 microseconds

after set Rust compiler opt-level = 3

Rust + Tokio improved performance
from: 144,792 ns
to: 60,000 ~ 70,000 ns

Rust Server

pub mod server {

    use tokio::sync::mpsc::{self};
    use tokio::sync::oneshot::{self};

    pub struct Request<MReq, MResp> {
        pub msg : MReq,
        pub resp: oneshot::Sender<MResp>
    }

    pub enum MReq {
        Event(i32)
    }
    pub enum MResp {
        Event(Result<(), ()>)
    }



    pub async fn start() -> mpsc::Sender<Request<MReq, MResp>> {
        let (client, mut server) = 
            mpsc::channel::<Request<MReq, MResp>>(16);

        tokio::spawn(async move {
            while let Some(req) = server.recv().await {
                let MReq::Event(n)  = req.msg;
                {
                    let mut temp = 1;
                    for num in 1..n {
                        temp += num;
                    }
                }
                let _ = req.resp.send(MResp::Event(Ok(())));
            }
        });
        
        client
    }
    
}

Rust Client


use chrono::PreciseTime;


mod server;
use server::server::{ start, MReq, MResp, Request};
use tokio::{sync::oneshot::{self, Receiver}};


#[tokio::main]
async fn main() {

    let server1 = start().await;
    let server2 = start().await;
    let server3 = start().await;

    let start = PreciseTime::now();
    // ===================================================

    for n in 0..1000 {
        
        let (recv_resp1, req1) = request_factory(n);
        let (recv_resp2, req2) = request_factory(n);
        let (recv_resp3, req3) = request_factory(n);
        
        
        let _ = server1.send(req1).await;
        let _ = recv_resp1.await;

        let _ = server2.send(req2).await;
        let _ = recv_resp2.await;
        
        let _ = server3.send(req3).await;
        let _ = recv_resp3.await;



    }

    // ===================================================
    let end = PreciseTime::now();
    let tm = start.to(end).num_microseconds().unwrap();
    println!("==> {} ns (microseconds)", tm) 

}


fn request_factory(n: i32) -> (Receiver<MResp>, Request<MReq, MResp>) {
    let (resp, recv) = oneshot::channel::<MResp>();
    let req = Request::<MReq, MResp> {
        msg: MReq::Event(n),
        resp
    };

    (recv, req)
}

my Elixir code is normal code, i dont use any tricks for increasing
performance just use GenServer

alice · December 10, 2021, 11:20am

Your Rust code is incomplete.

DanyalMh · December 10, 2021, 11:22am

sorry, completed now

DanyalMh · December 10, 2021, 11:33am

actually i think, its for optimization in Elixir (beam),

because beam use mailbox per process
and it dont use a something like oneshot for wait on response,
it just send and receive to mailbox and recently done more optimization for receiving from mailbox

alice · December 10, 2021, 11:33am

What if you do this?

        let _ = server1.send(req1).await;
        let _ = server2.send(req2).await;
        let _ = server3.send(req3).await;
        let _ = recv_resp1.await;
        let _ = recv_resp2.await;
        let _ = recv_resp3.await;

Hyeonu · December 10, 2021, 11:33am

It seems like you're effectively benchmarking the memory allocator. By default Rust uses system libc's malloc, which may not be the fastest allocator implementation. Try with jemalloc or mimalloc instead.

DanyalMh · December 10, 2021, 11:34am

because in Elixir work like this, and just wanted everything work same and see result of performance.

Elixir after send a request::call blocking process and it wait on response

alice · December 10, 2021, 11:48am

What happens if you use #[tokio::main(flavor = "current_thread")] here?

DanyalMh · December 10, 2021, 11:56am

I tested it right now,
its great

from: 60,000
to: 2,000

but actually,
I choose Rust for concurrency because i working on
distributed real time database

and when i heard about Rust+Tokio concurrency
i migrated from Erlang/Beam to Rust

alice · December 10, 2021, 12:37pm

Threads are fast at executing independent tasks, but communication between threads is slow. The more messages you need to send back and forth between different threads, the less benefit you get from threads. Your test case where all that happens is messages between threads is pretty much the worst possible case for threading.

I see you opened an issue on our issue tracker with the same question. It's going to be the same people who answer there and here.

DanyalMh · December 10, 2021, 12:50pm

Sorry I did not know this discussion would continue here because it was not very related to topic.

even when i used Tonic in there recommended use single-thread
and exist benchmark for not good result on rt-multithread scheduler

this

from above link :

Rust implementation provides best latency and memory consumption for a 1 CPU constrained service. It makes it a great candidate for services that are supposed to horizontally scale. On the other hand, scaled vertically it does not perform good

Benchmark Result :

Single Thread ==> ~ 50,000
Multi Thread ==> ~63,000

alice · December 10, 2021, 1:16pm

It is certainly possible to get good multi-threaded performance with Tokio. If the default setup doesn't perform as well as you hope, there are some tricks you can try. One of the simpler tricks is to move the accept loop into a tokio::spawn rather than doing it directly in main, which sometimes results in better performance. Another is to spawn multiple single-threaded runtimes.

DanyalMh · December 10, 2021, 1:19pm

You were right,

i wrote a benchmark for 1_000 client connection
each call one rpc in Tonic

in multi thread scheduler,
latency was better than single-thread

latency single-thread Tonic : under ~130ms
latency multi-thread Tonic: under ~100ms

again thanks

system · March 10, 2022, 1:20pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Benchmark (Elixir is faster than Rust+Tokio) when involved Message passing	3	1116	December 10, 2021
What is the best option to send millions of requests	3	1703	March 6, 2023
Suggestions for Multithreading Approach Improvements help	5	141	September 28, 2024
Why Tonic not good in beckmark of multi core?	9	9263	April 10, 2022
Require Help to understand my mistakes help	4	348	April 4, 2023

My benchmark done Elixir is Faster than Rust+Tokio, how do i improve rust code?

Related topics