Tonic gRPC server. Only handling one request at a time. Cannot handle multiple request at the same time

Hello,
I have the following proto-buf and the server side code. When SayHello is resolving a long running task, it puts all other calls ( any SayHello or SayHello2 ) to the server on a queue. the other calls are only resolved when the 1st call is complete.

I am likely missing a setting or config allowing tonic to handle multiple simultaneous calls. I tried googling a lot but failed to find any resource regarding handling ( resolving ) multiple calls at the same time.

I know flask ( a python framework) works on single core and needs gunicorn or some other server to spin up instances using workers to handle multiple calls.

My question is how can i handle(resolve) multiple simultaneous calls made to a tonic server.

Thank you in advance for your help.

message HelloRequest {
  string name = 1;
}

message HelloResponse {
  string message = 1;
}

service Greeter {
  rpc SayHello (HelloRequest) returns (HelloResponse);
  rpc SayHello2 (HelloRequest) returns (HelloResponse); 
}

I have the following server side code

use tonic::{transport::Server, Request, Response, Status};

use greeter::greeter_server::{Greeter, GreeterServer};
use greeter::{HelloResponse, HelloRequest};
use std::{thread, time};

// Import the generated proto-rust file into a module
pub mod greeter {
    tonic::include_proto!("greeter");
}

// Implement the service skeleton for the "Greeter" service
// defined in the proto
#[derive(Debug, Default)]
pub struct MyGreeter {}

// Implement the service function(s) defined in the proto
// for the Greeter service (SayHello...)
#[tonic::async_trait]
impl Greeter for MyGreeter {
    async fn say_hello(
        &self,
        request: Request<HelloRequest>,
    ) -> Result<Response<HelloResponse>, Status> {
        println!("Received request from: {:?}", request);

        let response = greeter::HelloResponse {
            message: format!("Hello {}!", request.into_inner().name).into(),
        };
        println!("Sleeping ");
        thread::sleep(time::Duration::from_millis(100000));
        Ok(Response::new(response))
    }

    async fn say_hello2(
        &self,
        request: Request<HelloRequest>,
    ) -> Result<Response<HelloResponse>, Status> {
        println!("Received request from: {:?}", request);

        let response = greeter::HelloResponse {
            message: format!("Hello {}!", request.into_inner().name).into(),
        };

        Ok(Response::new(response))
    }
}

// Runtime to run our server
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let addr = "0.0.0.0:50051".parse()?;
    let greeter = MyGreeter::default();

    println!("Starting gRPC Server...");
    Server::builder().concurrency_limit_per_connection(500)
        .add_service(GreeterServer::new(greeter))
        .serve(addr)
        .await?;

    Ok(())
}

thread::sleep blocks the entire thread, preventing it from doing work on other requests. Use tokio::time::sleep instead.

4 Likes