Help me choose a KV database

I'm looking for a KV-type database crate for use in a web framework/CMS application, mainly to store structs serialized to BSON but some raw strings as well. It needs to be async-compatible.

I have looked at redb and fjall as well as RocksDB but I can't decide which is the best. Is it worth the trouble of trying to support more than one?

You should use postgresql. postgres - Rust

2 Likes

Do you want your users to be able to bring their own KV-store (including, say, something cloud-hosted)? In which case supporting multiple protocols would enhance adoptability. If you want to go down the embedded KV-store route, I don't see how supporting both redb and fjall would benefit your users much. The only visible change would be the format of the file on your disk and both formats are probably (I haven't checked) way too niche to provide any interop capabilities for tooling or other services your users might have. With RocksDB it might be different, given how much more popular it is.

1 Like

I'm not sure what advantage bring-your-own would have given a generic hosting environment is already required. I don't have anything in mind that would require interop but it might be useful for debugging.

I have already considered SQL databases and they do not suit my use case

I would go:

  1. If read-heavy or LMDB write performance good enough ==> LMDB (crates.io: Rust Package Registry)
  2. If need more high throughput writes ==> RocksDB (crates.io: Rust Package Registry)

I would try to avoid RocksDB if you don't require the high write throughput. It's more complex to run and tune in production properly.

Also worth mentioning SQLite here. Note that Turso (GitHub - tursodatabase/turso: Turso is an in-process SQL database, compatible with SQLite.) is doing a native Rust one.

2 Likes

Does LMDB/heed not support tables or am I missing something?

Well, BSON is the mongodb's internal storage format which allows them efficient querying directly on document data. It's never a best choice for just serialization. If you want json-like format with binary data, msgpack is a popular choice.

Being able to work directly on the data would be nice but not absolutely necessary. Is msgpack more space-efficient than BSON? I did consider using mongodb but the license issues and company's "we're all about AI now :DD" are dealbreakers

I would say that main question does not have enough information to answer it properly.

It all depends on how much data you are going to store and how it is going to be used, etc. In general i would say that keeping it simple is the best approach. Don't overthink it too much too early.

If unstructured data is going to be used MongoDB is great. In my experience it has worked better than i did expect. You won't have to think about BSON serialization, since objects are a native thing in MongoDB; DB will handle that for you. You just use collections and store objects, strings, etc. there. Depending on your application, the drawback could be that MongoDB must be run in separate process (but that could be an advantage as well, so... really depends on the use-case).

KV storage libraries like LMDB, RocksDB basically provide you with persistent HashMap alike experience. That is it, nothing more, you store keys and values; you must think about key and value serialization format by yourself, in general if you need table alike structures, ordered data, etc. you'll have to come up with your own solutions, in the end you are going to reinvent the wheel. Calling them a KV databases is an overstatement, it would be better to say you can use them to build databases.
That aside, RocksDB works great, compile times are high though. Advantage for KV storage libraries is that they are application local, not a separate process, works good for simple data. The bigest drawback is tooling, you will not have any tools to delete, update, change structure for objects with specific values, etc. all that will have to be done manually which is a lot of work. MongoDB, Postgresql and alike DBs will provide you with tools that will help you with schema migration, data updates, analysis and a lot more out of the box.
Sqlite can be used as KV store as well, it is stable, well tested, has tooling. Just create table where one column is key and another is serialized JSON value. For few thousand rows it will work just fine. The best thing is that it stores data in almost single file, not like RocksDB that uses directory with multiple files.

If data is really that simple - few serialized objects as JSON, a file system is good enough storage. Just store serialized objects inside files. File name is a key, and contents are value. The simplest KV store ever. Sometimes it is the best solution, no need to over-engineer things for simple tasks. I.e. a tool called "git" uses file system as storage where you store key-value alike information and it works great.

1 Like

how much data you are going to store and how it is going to be used

Mostly reading of assets (pages, etc), otherwise things like retrieving and updating user sessions

MongoDB

See my above reply for reasons against using it

you must think about key and value serialization format by yourself

This is fine; I plan on writing a couple traits and using serde to do most of the work in convering to BSON or msgpack, same with migration stuff. I will probably end up creating a small library for this.

Yes, i saw your reasons against MongoDB, but to me they seemed a bit over exaggerated. It is an open source project; if it goes too much against community wishes, it can be forked.

As for low level stuff... while it is doable, shouldn't you rather spend your time and effort on higher level stuff and solve problems relating CMS itself? There are plenty challenges there already. If for educational purposes, it is nice to work at basic level, get to know how things work under the hood and in the end it makes you appreciate more existing solutions.

Somehow to me it seems that any well known general purpose DB would be enough for your requirements. I personally like Postgres a lot, but really, MongoDB, MariaDB or any other easily will handle hundreds of thousands of items on a decent hardware. When it goes in millions it starts to become interesting. Somehow in my experience it usually turns out that "boring stuff" works out better in the end. And if there really are bottlenecks, then optimize them later; it is a far better problem to have rather than spend half year designing new data store and then giving up on a project.

But in the end it is up to you and your goals - just experimenting, getting things done, exploring technology.

1 Like