Help with Database Pooling Architecture (tokio_postgres & deadpool_postgres)

Advisory

I'm new to Rust (not lower level languages) so I may mess up terminology and this will be a fairly long post so be prepared for that. :smile:
Hopefully this will help other people out if they hit similar problems since I have searched a lot and read a lot of documentation on the various crates, Rust examples, and the Rust Language Programming book and didn't find what I was looking for yet.

Problem Statement

I am porting/rewriting a python framework I wrote that interacts with database servers and database accounts in Rust for fun.
I want to connect to hundreds of database servers that can contain anywhere from 1 to 50 distinct databases. These are all postgres databases.
I have implemented most pieces of the code in various crates, but have found myself a bit stuck as to what are the best practices around pooling and parallelization for Rust and if there are community crates that are preferred for postgres.

Crates Explored

Currently I have looked at these:

  • tokio
  • tokio_postgres
  • r2d2
  • deadpool_postgres
  • postgres

I may have missed a crate that is better suited for what I am trying to do so if you know of one, please let me know. For pooling and connecting to a postgres database, tokio_postgres and deadpool_postgres seemed to be actively developed and well documented so I looked into them the most.

Architecture I'd Like

Using tokio_postgres and deadpool_postgres it looks something like this:

     ____________________________________________________
    |                                                    |
    |  deadpool_postgres                                 |
    |  ________________________     ___________________  | 
    | |                        |   |                   | | 
    | | Manager                |   | Pool              | |
    | |------------------------|   |-------------------| |
    | | tokio_postgres::Config |-->| max_size, Manager | |
    | |------------------------|   |-------------------| |
    | | ManagerConfig          |   |                   | |
    | |________________________|   |___________________| |
    |_______________________________/_____/__|__\_____\__|
                                   /     /   |   \     \
                                  /     /    |    \     \
                                _/_   _/_   _|_   _\_   _\_
                               | d | | d | | d | | d | | d |
                                ---   ---   ---   ---   ---

In words:
You create a tokio_postgres::Config, a deadpool_postgres::ManagerConfig, and pass those to a Manager::from_config(). After that, you can then create a deadpool_postgres::Pool::builder() object you can get connections from when available. From this point, you could then iterate through databases/servers and execute queries.

Adapted Example Code

From the deadpool_postgres documentation: Using the tokio postgres Config I wrote this:

use tokio_postgres::{NoTls, Error};
use tokio;
use deadpool_postgres::{Manager, ManagerConfig, Pool, RecyclingMethod};

#[tokio::main] // By default, tokio_postgres uses the tokio crate as its runtime.
async fn connect_to_database() -> Result<String, Error> {
    // Set config  values for tokio_postgres
    let mut pg_config = tokio_postgres::Config::new();
    pg_config.application_name("rust_tester");
    pg_config.password("a_password");
    pg_config.user("a_user");
    pg_config.dbname("a_dbname");
    pg_config.host("host1.subdomain.domain.org");
    
    // Would be nice to use a postgres connection string, but doesn't work
    // pg_config.host("host1.subdomain.domain.org");
    // pg_config.?????("postgresql://a_user:a_password@10.1.1.10:5432/a_dbname");

    // Create manager config for deadpool_postgres
    let mgr_config = ManagerConfig {
        recycling_method: RecyclingMethod::Fast,
    };

    // Instantiate manager and pool for deadpool_postgres
    let mgr = Manager::from_config(pg_config, NoTls, mgr_config);
    let pool = Pool::builder(mgr).max_size(16).build().unwrap();
    
    // Iterate some connections
    for i in 1..10 {
        let client = pool.get().await.unwrap();
        let stmt = client.prepare_cached("SELECT a_row FROM a_table").await.unwrap();
        
        for rows in client.query(&stmt, &[]).await.unwrap() {
            let row: String = rows.get("a_row");
            println!("{row:?}");
        }
    }

    Ok(String::from("finished connecting to database"))
}

fn main() {
    let reply = connect_to_database();
    println!("{reply:?}");
}

This works (if I use the real values), but obviously with hardcoded values and it just runs the same query 9 times. This is great for connecting to the database server and then running a bunch of queries against a single database.

My Current Misunderstandings

The Manager::from_config() seems to be limited by the tokio_postgres::Config in the sense that you can only pass it a single database. You can technically pass in many hosts by calling the .host() function repeatedly, but if each host has different databases that's fairly irrelevant.
The Pool::builder() takes the Manger as a parameter (I didn't find any docs that show otherwise) so this pooler seems to solve the problem:

one database server, one database, many connections.

Is that right or did I miss something?

My thinking could be wrong, but what I think I should be able to do is open some number of connections to a database server and then grab those connections, connect to the specific databases in parallel, and run a query. Is that doable with deadpool_postgres or some other crate?

I have also read a little bit of Rust Async Book so maybe I am just not far enough along yet to understand my full problem set and how to solve it. I am happy to change course and use a different crate if there is a better way I should be approaching this problem.

If You Read This Far

Thank you for your time this was a lot and tried to condense and be concise. All help is greatly appreciated.

Is there no one that knows about handling connection pools programmatically in Rust on these forums or is this topic to boring or some other thing? :smile:

That is correct. Database connection pools (in many languages, including in Rust) conventionally assume that all connections in the pool are interchangeable with each other; if each connection may be connected to a different database, then they no longer are. You'll also find that frameworks vitiate against doing things like durably changing the schema search path or any session parameters, though they usually can't prevent it outright; these are also things that make connections non-interchangeable.

This isn't really supported by postgres, right down to the wire level. The way to switch databases is to restart the connection, incurring most of the cost of a new connection in the process; this is what psql does with its \c <DATABASE> command, for example. It is supported by DBMSes that treat databases more like namespaces, including MySQL (where a USE statement changes the default database for a connection, and any database the user has access to can be referenced in a query at any time), but postgres, by design, treats databases as administratively separate.

The architecture I'd use here is probably not to pool connections in the first place, or to keep one pool per database (with a small limit on open connections).

1 Like

Yes you're right about postgres. I was envisioning a generic connection to the postgres server on the instance rather than what you have to do, which is make a connection to the postgres server and a specific database on the instance. :thinking:

Therefore, I get nothing from using a connection pooler since I am making only a single connection to each database in my use case written for this post.

If I want to parallelize the work, I need to create an asynchronous function that can go connect to the databases in parallel and do the work. Is that how you see it too?

1 Like

Yes, I would treat connecting to the database as part of each task, rather than as a separate pre-task setup step.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.