I started a Rusty Blog - 1st article "Concurrency in RustDb"

I decided to start a blog on my adventures programming in Rust. This is the first article:

All feedback welcome. I should say, I have had lots helpful suggestions from people here while working on this project.

5 Likes

It's a big "Coming soon" screen for me?

1 Like

Ah, thanks! It seems I hadn't completed all the steps required. Is it ok now?

Yupp!

1 Like

Wow, this is super interesting! I've been watching your work from afar over the last 6-12 months and I think it's really cool that you are sharing your learnings.

Possible solution ā€“ use a copy of the database

The long query is performed on a copy of the database. This is essentially the solution adopted by RustDb. It sounds expensive and impractical : what if the database is many gigabytes of data, taking a copy of the entire database could take a long time, and in the mean-time no updates can happen. This leads use to the solution adopted by RustDb.

This reminds me of @jonhoo's Noria database, and in particular the evmap data structure he developed.

Jon put out an excellent video that shows how it works in great detail, but the general idea is to have two copies of the database - one for reading and one for writing. You can have as many readers accessing database A as you want and one writer writing to database B, then at a particular point in time the databases will be swapped and all modifications from database B will be applied to database A so we can start using it as the write database (meanwhile, new readers can start reading from database B).

There is a bunch of unsafe pointer magic involved to make sure you never clone() a value, but that's all nicely tucked away.

1 Like

I remember reading about the two copy idea. Wouldn't it be necessary to wait until all the old readers have finished before the modifications can be applied?

At any rate, the scheme I describe in the blog works very nicely. It did take me a long time to figure out the correct way to free pages when a reader finishes. I had some kind of mental block on that, and at one point it was incorrect ( which I only discovered by writing a test!). I kept thinking a history page began rather than ended at the time noted.

One thing that is missing is a retain function on the result of range_mut. It's not a big deal, I use have the Vec empty to note pages for which there is no longer any version history. But I think in theory a retain function could be more efficient.

1 Like

I'd need to watch the stream again, but I think the actual requirement is just that you need to wait for all readers of the old read database to finish before you can start applying the latest modifications.

So in theory, you could have something like this:

  • The writer is given write access to database A
  • Reader 1 starts reading database B
  • The writer finishes their writes and signals a switch (i.e. database A is now for reading and database B is for writing)
  • The writer wants to write something to database B, but must wait for reader 1 to finish
  • Reader 2 comes along and immediately starts reading from database A (the one with the recent modifications that is now readonly)
  • Reader 1 finishes reading from database A
  • The writer is now unblocked and applies all modifications from database A (old write database) to database B (new write database, now with no readers)
  • Syncing is finished and the writer can now start writing to database B

That sequence of events would mean that new readers are never blocked (I think switching is just an AtomicPtr compare-and-swap) and it's the writer's responsibility to make sure both databases are synchronised after a switch before they can start writing again.

Right. That seems undesirable, because what if Reader 1 is taking minutes or hours to finish? That's why I stated in my blog "A database can be large, and queries may need to access a substantial portion of the data."

Ah yeah, that's one of the tradeoffs you make. You optimise for read throughput at the risk of starving your writes.

You also have eventual consistency, so writes won't be visible until some time after a switch.

It's not going to be applicable for all use cases but I stil thought the idea is really smart.

Out of curiosity, is this motivated by a real world use case? Or is it more one of those things where you have a particular set of values/constraints and want to see whether it's possible to implement something which upholds them?

1 Like

I have been making websites for small businesses with Microsoft asp or asp . net and MSSQL for over 20 years. The objective is to make a website/database solution using Rust that "just works" (reliably, fast) and doesn't suffer from the various problems I have encountered along the way. The software I would like to have had all this time. Although, to some extent maybe I am doing it just because I find it very interesting and fun.

2 Likes