Best ways to interface with databases for Rust web backend (actix)?

I'm currently looking into using Actix for my next project, and I want to use actix for its performance. In order to make a serious web backend project, one must be able to interface with a database. By far, the best database interface system that I've used is C#'s EFCore.

I like EFCore for these reasons:

  • Full async/await support, so all IO with the database is non-blocking — this is very important when running any web backend at scale
  • ALMOST write-once run-anywhere for different DB engines (I mainly care about Postgres annd MySql/Mariadb
  • It's possible to write the code for it very quickly — the OOP model for this tool is wonderful

Is there anything similar for Rust that I can use with Actix? I have found some crates on crates.io that have some of this, but nothing that has all of it and seems to be a serious project that won't stop being maintained tomorrow.

Here's a simple example of Actix being used with an SQLite lib and r2d2 for coordinating the connections.

Personally I'm waiting for sqlx to get a bit more mature, since that is much more ergonomic with Async Rust.

1 Like

Note that r2d2 is a blocking library.

1 Like

Diesel is a used a lot and does have same some similarities with EFCore IMHO. If you like EFCore I guess you will like Diesel as well.

When looking into Diesel, it seems very nice, except for the fact that it's synchronous, which is a fatal flaw.

As quite a few people complain that diesel does not provide a async interface yet I should probably write a few words on why such an interface is not provided. For me personally there are two reasons:

  • First of all it is due to language issues not possible to write a usable abstract transaction interface. (For anyone interesting, here is the corresponding playground, we are basically running in this issue)
  • At least my benchmarks show a performance penalty for existing async database driver implementations (See here for the results). Therefore I'm not even sure if it is meaningful to have a "async" database connection in rust

Interesting. I'm using postres with Rocket. Works well, all sync of course.

I noticed the other day that the postgres crate docs states:

This crate is a lightweight wrapper over tokio-postgres.

So now I'm puzzled. If the sync postgres is a wrapper over the async tokio-postgres the how can you be finding a performance penalty for postgres?

As a neophyte my impression is this that there arrival of async in the world has created is a total mess. It seems to infect and greatly complicate pretty much everything in Rust.

Mind you my Python using friends have been fighting with this sync/async thing in their language as well.

Seems to be difficult to avoid. As described nicely here:

I should probably worded that a bit different. postgres is doing fine in the benchmarks, even though it is async internally. My main point there is that non of the purly async implementations seems to have a significant performance win over the non async variants. Quite a lot people claiming that they need async on all costs because is some magic fairy dust, which makes your application faster, which is obviously not the case. Those benchmarks showing exactly that. If you are using a database, the network IO is only a minor fraction of the work done their. Other factors will quite likely dominate, so optimizing only the IO part does not really make a difference.
That said I feel that I should probably add here that I know that there are valid use cases for async-io, even for async database io, but my point is: Most people likely don't need that.

I mean, async usually doesn't give performance any wins until you have a lot of connections, and typically you want a rather low database connection limit.

1 Like

I haven't (yet) looked at your linked benchmarks in detail, but I do want to explain why async is so important to me — "magic fairy dust" has nothing to do with it. In a web backend, there will be some finite number of hardware threads that process requests. Ideally, each of these threads will always be working, if the server is at capacity. Any blocking library prevents this, since time will be wasted waiting on the DB server to respond (as is exacerbated if the DB server is on a separate physical computer, as is common with larger scale deployments), when that time could be spent processing other requests.

At this point, I would like to submit some other benchmarks that show that async truly matters at scale. Looking at only actix (to control for variables), the actix implementations that use async DB connectivity (via tokio_postgres, from checking the source of these benchmarks) do significantly better (getting scores around 100%) than actix with Disesl (getting a score of just 23.9%), showing that async truly does matter (assuming that Diesel doesn't have something else that would cause such a drastic slowdown).

I'm not really sure if that's really the case. Sure if you have a web backend that is only responding with a static web page this will likely matter, but as soon as you use other resources this become less important. So for the case your application is CPU bound, going async will not change anything (in the best case) as a CPU can only do a certain amount of work in a given time span. For IO-bound applications things are more complex. If that's the case async can help, but it really depends. As @alice mentioned above: database connections are a rare resource. You normally have much less database connections than threads in your web backend. So if your service receives some load and all database connections are already in use, a fully async implementation would in theory allow you to do something different while the your database connections are waiting for response. In practice, there are no database connections left that could do any potential work.

Practically speaking: I don't think async matters that much for most web applications. They just don't have the scale that it would matter. As example, crates.io is using diesel internally. They don't even use one of the advanced async web frameworks. The least think I've heard about their hardware requirements was that they could run the whole thing on a free heroku instance and it handled all traffic just fine. So if you don't build something that expects much larger traffic than crates.io you probably don't need async a async database crate.

As both implementations execute at least in some cases completely different queries I find the claim that this difference is mostly due to diesel being not async quite speculative. Additionally the whole benchmark tests mostly IO related scenarios. Each of those queries returns exactly one element as far as I can tell, that means overhead regarding deserialization coming from the database library is likely underrepresented to common work loads. At least those benchmarks I've linked above show that some of the async database crates getting "slower" and "slower" as soon as you receive a substantial amount of data from the database.
That written: I would be happy if someone could submit a new benchmark case to the diesel benchmark library that clearly shows where a async database crate provides substantially better performance.

1 Like

This is really surprising to me, although it probably shouldn't be since I don't know how much traffic crates.io handles, or what, exactly, comes through from the database (e.g. is source code stored in a relational database)? Do you have information on this? If so, it would heavily inform my choice of tools for my next project.

crates.io is open-source: https://github.com/rust-lang/crates.io

As already mentioned, the source code for crates.io is open source, so just take a look there. I do personally not contribute to the crates.io code base, so the following is just that what I think to know.

First of all they use their home grown web framework, because there was no real web framework back then when they started developing the service. If you start developing a new application you probably want to use something different.

As for the database calls: I think the source code for each crate is stored in a S3-bucket, crates.io just redirects download requests there. I think for that is still at least one database call required, as they need to lookup the correct address + check if that crate even exists. Beside of that I assume most of their endpoints interact with the database. So every time someone visits some subpage on crates.io a bunch of information are loaded from the database. I think I've heard somewhere that getting the number of downloads for the last x days was one of the expensive queries that took quite some while. Uploading a new crate does of course also access the database, by the required dataset for the newly uploaded crate.

the orm rbatis