How can I improve this very slow performance after adding database?

I’m building a simple full-stack website. The performance was excellent before I added the database—it was hitting 180,000 requests per second when returning an HTML page to the browser (I tested the “/” endpoints, which return HTML page).

Here is the minimal reproducible code:

Main.rs

use bb8_postgres::PostgresConnectionManager;
use ntex::web::{self, types::State, HttpResponse, Responder, HttpServer};
use rust_embed::RustEmbed;
use tokio_postgres::NoTls;
use serde_json::{Map, Value};
use tokio_postgres::{Row, types::Type};

#[derive(RustEmbed)]
#[folder = "ui/"]
pub struct Frontend;

async fn frontend_handler() -> HttpResponse {
    let html = Frontend::get("index.html").unwrap();
    HttpResponse::Ok()
        .content_type("text/html")
        .body(html.data.into_owned())
}

async fn get_api(
    pool: State<Pool<PostgresConnectionManager<NoTls>>>
) -> impl Responder {
    let conn = pool.get().await.unwrap();
    let data = conn.query("SELECT * FROM produk", &[]).await.unwrap();
    let json = {
        let result: Vec<serde_json::Value> = data
            .iter()
            .map(crate::row_to_json_object)
            .collect();
        serde_json::Value::Array(result)
    };

    web::types::Json(json)
}

fn row_to_json_object(row: &Row) -> Value {
    let mut obj = Map::new();

    for col in row.columns() {
        let name = col.name();
        let typ = col.type_();

        let value = match *typ {
            Type::INT2 => row.get::<_, i16>(name).into(),
            Type::INT4 => row.get::<_, i32>(name).into(),
            Type::INT8 => row.get::<_, i64>(name).into(),
            Type::FLOAT4 => row.get::<_, f32>(name).into(),
            Type::FLOAT8 => row.get::<_, f64>(name).into(),
            Type::TEXT | Type::VARCHAR => row.get::<_, String>(name).into(),
            Type::BOOL => row.get::<_, bool>(name).into(),
            Type::TIMESTAMPTZ | Type::TIMESTAMP => {
                let dt: chrono::NaiveDateTime = row.get(name);
                dt.to_string().into()
            }
            Type::DATE => {
                let d: chrono::NaiveDate = row.get(name);
                d.to_string().into()
            }
            _ => {
                Value::Null
            }
        };

        obj.insert(name.to_string(), value);
    }

    Value::Object(obj)
}

#[ntex::main]
async fn main() -> std::io::Result<()> {
    let conn_str = "host=localhost port=5432 user=postgres password=fuji dbname=tes";
    let manager =
        PostgresConnectionManager::new_from_stringlike(conn_str, NoTls).unwrap();
    let pool = Pool::builder()
        .max_size(200) // Increase pool size
        .min_idle(Some(50))
        // .max_lifetime(Some(std::time::Duration::from_secs(3600)))
        // .idle_timeout(Some(std::time::Duration::from_secs(600)))
        .build(manager)
        .await
        .unwrap();

    HttpServer::new(move || {
        web::App::new()
            .state(pool.clone())
            .route("/", web::get().to(frontend_handler))
            .route("/api", web::get().to(get_api))
    })
    .bind(("127.0.0.1", 8080))?
    .run()
    .await
}

Cargo.toml


[package]
name = "my_app"
version = "0.1.0"
edition = "2024"

[dependencies]
ntex = { version = "*", features = ["tokio"] }
serde = { version = "*", features = ["derive"] }
serde_json = "*"
bb8 = "0.9.0"
bb8-postgres = "0.9.0"
tokio-postgres = { version = "0.7", features = ["with-chrono-0_4"] }
chrono = { version = "0.4", features = ["serde"] }
rust-embed = "8.6.0"

[profile.release]
opt-level = 3
codegen-units = 1
panic = "abort"
lto = "thin"
debug = false
incremental = false
overflow-checks = false

After adding the "/api" with that database functionality above and then testing the "/api" endpoint, performance dropped to only 15,000 requests per second, down from the previous 180,000.

So far I’ve observed:

  • The JSON encoder function isn’t the main culprit, because when I commented out that code and returned a hard-coded JSON string like "hello", performance stayed at around 15,000 req/s.
  • The database code is what’s causing the drop: when I comment out the query itself, performance jumps back up to about 50,000 req/s.
  • It turns out that even just acquiring a connection from the pool has a very big impact—if I comment out the pool.get() call, performance immediately spikes to 230,000 req/s, even exceeding the 180,000 req/s I saw for serving big enough HTML

So in summary:

  • Returning HTML alone = 180,000 req/s
  • Returning a plain hard-coded JSON string = 230,000 req/s
  • Adding a code to acquire a connection from the pool (but not running any database query) = drops to 50,000 req/s
  • Then adding the database query = drops further to 15,000 req/s
  • Encoding the JSON string = still 15,000 req/s

And for this: I’ve already tried tuning the pool settings—raising the max size from 50 to 200 and setting min idle to 50—but it still had no effect.

My questions:

  1. Why does simply acquiring a connection from the pool in the code above (not doing any database query) reduce performance from 230,000 req/s to 50,000 req/s?
  2. Why does querying a table that contains only 1 very simple row—[{"nama":"produk 1"}]—further reduce performance from 50,000 req/s to 15,000 req/s?

I would greatly appreciate any insight

If you're measuring req/s serially (one client making one request after another), you're measuring latency, rather than throughput.

When you need to make a connection to another process or server, that adds latency. When you need to send a query and wait for a response, that adds latency.

Postgres is big, and designed for much bigger and more complex tasks. This means that even a simple query goes through a lot of complex processing, query optimization, transactional storage, multiple layers of caches, etc. They add some cost, but the minimal task is too simple to benefit from them.

Doing things inside the server, without calling to an external database, will have better latency. If you need some endpoints to have super low latency, add caching or preloading of data from the db (but be careful with lock contention).

Make sure you also measure throughout by sending multiple requests to the server in parallel. Tools for this usually have some concurrency settings. Latency is important, but seeing latency vs throughput trade-off will show you a better picture of the costs and overheads.

3 Likes

How does the caching or preloading look like/work? Do you have crates in mind that shows the idea? I know caching is a big topic. Do you mean using Redis?

Caching can be as simple as storing values in memory, say a hash map, for example. Redis would be another option, but will add latency again due to the added overhead of IPC.

Imagine there is a category that contains a list of products and the list changes from time to time.

Arguably the list can be cached if list changes

  1. Assigning a product to the category
  2. Removing a product to the Category
  3. Bulk removing/assigning
  4. Scheduled removing/assigning

But this is really error-prone as it is easy to miss one or two actions.

The biggest problem will be on the products themselves. After a list of product IDs is fetched, it is necessary to fetch all important data like name, price, images and promotional information(with scheduled start and end dates).

I am not sure how that dynamic nature(things change according to schedules or ad-hoc) can be cached.

I used wrk when testing it — is wrk already correct?
I tested it again by comparing it with what I built using May MiniHTTP and May Postgres. Both retrieve data from the same database, which only contains this:

tes=# select * from produk;
 id |  nama   | harga
----+---------+-------
 21 | produk2 |  1000
(1 row)

The results are still significantly different. The top one is from the above code ntex with tokio runtime + tokio-postgres, and the bottom one is from may minihttp + may postgres:

[root@LAPTOP-7QHQMBKS sandal]# wrk -c 250 -d 15s -t 8 http://127.0.0.1:8080/api
Running 15s test @ http://127.0.0.1:8080/api
  8 threads and 250 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    27.68ms   23.23ms 254.85ms   63.26%
    Req/Sec     1.24k   106.86     1.60k    68.00%
  148454 requests in 15.03s, 21.09MB read
Requests/sec:   9875.22
Transfer/sec:      1.40MB

[root@LAPTOP-7QHQMBKS sandal]# wrk -c 250 -d 15s -t 8 http://127.0.0.1:8080/api
Running 15s test @ http://127.0.0.1:8080/api
  8 threads and 250 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.53ms    1.62ms  48.69ms   85.49%
    Req/Sec     5.64k   518.62     7.83k    71.42%
  674291 requests in 15.05s, 134.40MB read
Requests/sec:  44812.01
Transfer/sec:      8.93MB

I mean something in memory of the Rust server. You'd use it like redis, but without the overhead of calling to redis.

For preloading or caching individual values, the most basic option is:

static CACHED_VALUE: RwLock<Option<SomeValue>> = RwLock::new(None);

You may need Arc<SomeValue> to share it. If you do, also check arc-swap.

For more general caching of many things, see moka.

Before you dive too deep — 15K req/s is still pretty good in absolute terms. You may not need better performance. If you just need to get stuff done, look into caching later when it really becomes a problem.