Help figuring out performance of using a tokio one shot channel

I wanted to run a benchmark using criterion crate against the following code

pub struct Handler {
    sender: mpsc::Sender<oneshot::Sender<String>>
}

impl Handler {
    pub fn new() -> Self {
        let (sender, receiver) = mpsc::channel(8);
        tokio::spawn(listener(receiver));
        Self { sender }
    }

    pub async fn process(&self) {
        let (tx, rx) = oneshot::channel::<String>();
        let _ = self.sender.send(tx).await;
        rx.await.expect("something wrong happened");
    }
}

pub async fn listener(mut receiver: mpsc::Receiver<oneshot::Sender<String>>) {
    while let Some(sender) = receiver.recv().await {
        let _ = sender.send("howdy!".to_string());
    }
}

Benchmark code

fn bench_oneshot(c: &mut Criterion) {
    let mut group = c.benchmark_group("bench_oneshot");
    group.sample_size(100);
    group.measurement_time(Duration::from_secs(90));

    group.bench_function(
        "one_shot",
        |b| {
            b.to_async(Runtime::new().unwrap()).iter(|| async {
                let handler = Arc::new(Handler::new());
                for _ in 1..100_000 {
                    let handler = handler.clone();
                    handler.process().await;
                }
            });
        }
    );
}

criterion_group!(benches, bench_oneshot);
criterion_main!(benches);

The benchmark comes back with a performance measure of ~660ms

I now modify the code to not use one shot channel

pub struct Handler {
    sender: mpsc::Sender<String>
}

impl Handler {
    pub fn new() -> Self {
        let (sender, receiver) = mpsc::channel(8);
        tokio::spawn(listener(receiver));
        Self { sender }
    }

    pub async fn process(&self, msg: String) {
        let _ = self.sender.send(msg).await;
        "Ok";
    }
}

pub async fn listener(mut receiver: mpsc::Receiver<String>) {
    while let Some(msg) = receiver.recv().await {
        println!("{:?}", msg);
    }
}

Benchmark code

fn bench_oneshot(c: &mut Criterion) {
    let mut group = c.benchmark_group("bench_oneshot");
    group.sample_size(100);
    group.measurement_time(Duration::from_secs(90));

    group.bench_function(
        "one_shot",
        |b| {
            b.to_async(Runtime::new().unwrap()).iter(|| async {
                let handler = Arc::new(Handler::new());
                for _ in 1..100_000 {
                    let handler = handler.clone();
                    handler.process("hi!".to_string()).await;
                }
            });
        }
    );
}

criterion_group!(benches, bench_oneshot);
criterion_main!(benches);

The benchmark then comes back with a performance measure of ~90ms

What I found is that when using the tokio one shot channel to send back a response, it degrades the performance by 7 fold. I'm not sure if I'm doing the benchmark wrong or if I'm setting up the handler and listener wrong. Could someone guide me as to whether this is expected performance or if my code needs to be revised or if I should be doing the benchmarking differently?

1 Like

This is entirely unsurprising. The version with an oneshot channel performs a lot more waiting than the other version. Every time it sends a message, the listener must run and reply before the next message can be sent. Without the oneshot channel, you could send a thousand messages before the listener even sees the first one.

Also, you're running process in block_on but listener in tokio::spawn. This means that you're forcing them to run on different OS threads. That's not a normal workload for a Tokio application. If you put both tasks inside of a tokio::spawn, then they will be able to run on the same OS threads, hopefully reducing the cost of the oneshot channel.

3 Likes

Sure. So if I understood it right, the oneshot channel's receiver in the process method is waiting for the reply from the sender in the listener which is blocking the process method to process the next iteration in the benchmark method. Is my understanding correct?
If so, could you let me know how I can rewrite this? My goal is something like incoming requests coming in through a unix socket that gets read and sent to the handler but the listener should not be a blocker when multiple requests come in and should process the requests in parallel and send the response to the respective request invokers to the socket

Also, you're running process in block_on but listener in tokio::spawn .

Could you elaborate on how I should be modifying my benchmarking code?
My understanding from the criterion docs was that in benchmark method, if I used the Runtime from tokio then it would be in the tokio executor?

Criterion is fundamentally the wrong tool for measuring things that serve many incoming connections such as web servers. With web servers, one connection can hurt the performance of other connections when they exist at the same time. So it matters when the connections are created during the benchmark in relation to each other. It also matters whether the benchmark is open or closed. This is fundamentally not the kind of benchmark that criterion makes. Criterion is for micro-benchmarking individual functions or small code blocks within your Rust application.

I would look for tools similar to wrk instead of criterion.

It is on the Tokio runtime, but it is using Tokio's block_on entrypoint instead of Tokio's tokio::spawn entrypoint. To fix this, wrap the process loop in tokio::spawn.

.iter(|| tokio::spawn(async {
    // ... 
}));
2 Likes

Thanks. Will look at the paper and the tool wrk for benchmarking

Regarding the code setup of the handler and listener for the use case of using a socket and enabling the processing of multiple requests in parallel, is it correct?

I should probably clarify that I wanted to understand whether the handler listener setup is fine for the use case of processing requests through a socket with respect to the use case alone and not benchmarking. This would be for an example in a database server listening to a unix socket and requests come in that gets processed for a look up or a database write. Would this handler listener be fine to be used in that case? Or are there other better performant workflows that needs to be used in this case?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.