Recommendations for cache-type database

John_Nagle · March 5, 2023, 5:10am

What's a recommendation for a database for a cache?

Will store items from 100 bytes to 4MB, most around 100KB-200KB.
Database size about 20-50GB.
Don't need ACID properties, but would like a guarantee that, after a crash, whatever is stored is either
intact, or the database is known to be corrupted and must be rebuilt. (It's a cache, after all.)
It's just key/value. No joins or searches.
Will occasionally need to find the oldest items and delete them.

Would prefer it be entirely in safe Rust.

jofas · March 5, 2023, 10:18am

Interesting setup. Can you tell us a bit more about your use-case? A 50GB cache makes me wonder, do you intent to store the cache on-disk (i.e. because your database runs on a different machine and your network latency is too high so that retrieving the cached value from disk is measurably faster)? Or do you have a machine that has that much memory to spare for an in-memory cache? Also, why do you prefer a DB written in Rust over well-established key-value databases not written in Rust, like Redis, Memcached, etcd, Riak, etc.?

John_Nagle · March 5, 2023, 6:09pm

It's the content cache for a metaverse client. All the content is fetched from the servers, but has to be cached locally or it takes minutes for a user to log in and get a clear view of the world. Nothing is ever changed; keys are UUIDs and the associated contents never changes. It just gets discarded when stale.
It's definitely an on-disk cache, usually an SSD today.

The current implementation is a directory with tens of thousands of files in it, which means too much time is spent in the OS opening and closing files.

Redis is overkill. Memcached is client/server. Riak is distributed and in Erlang. I don't need anything that complicated. This is purely a local database. It's a relatively minor part of the system.

What I need is like the storage engine inside a caching server such as Ngnix or Varnish, minus the client/server stuff.

jbe · March 5, 2023, 7:14pm

I was also looking for solutions for key value stores in Rust a while ago. For my own uses, I created mmtkvdb (which uses LMDB as a backend), but it's using memory mapped files and can exhibit UB if the storage is corrupted (which is why it also requires using unsafe when opening a database). Moreover, it hasn't been thoroughly reviewed.

John_Nagle · March 5, 2023, 7:43pm

Something that crashes on bad data is not helpful here.

I already have two crates on which I rely that are causing crashes. I'm not going to name names in this topic, but currently I spend more time on bugs in lower level crates than I do on my own code.

jbe · March 5, 2023, 7:44pm

Yeah, hence my disclaimer. I think this also rules out a lot of other non-Rust DBs.

I'd be interested in a safe-Rust key value store as well.

alice · March 5, 2023, 8:15pm

Have you tried using several levels of directories so that no single directory has a lot of files? This might perform better.

John_Nagle · March 5, 2023, 8:19pm

There's always that approach. There's still a lot of file opening and closing involved, but the lookup times improve.

I was hoping there was something simple I could use. This is just key-value of blobs; the database never looks at the contents of the data.

jbe · March 5, 2023, 9:15pm

Re-thinking about this, I don't think that most databases result in UB because of corrupted storage (but I'm not certain). In many cases, they would just abort, I guess (which is still better than UB). But maybe there's also databases around which provide even better error handling on a corrupted state and/or improper API usage.

However, I feel like when you have C APIs, it's rarely documented what happens when certain prerequisites are not met. So I do understand the wish for a pure (safe) Rust solution.

H2CO3 · March 6, 2023, 8:33am

Would SQLite be an option? I know it's not specifically just a key-value store, but it's trivial to use as such. It's in-process, has a de facto Rust crate, it's highly concurrency-safe and hard to corrupt, and it's usually faster than the filesystem for small BLOBs, exactly because it avoids re-opening files for every lookup.

Hyeonu · March 6, 2023, 8:41am

You can run Redis on machine with enough swap space. It would be slow as heck if you need in memory cache but it doesn't seems so.

jbe · March 6, 2023, 9:31am

Do you know how SQLite behaves when the on-disks storage is corrupted? Will its API return errors or will it abort the process? I see there is an SQLITE_CORRUPT error code, but I wonder if it's guaranteed that this code or other codes will be returned for all sort of corruption, and that there exists no state of the database, which will lead to an abort of the process or an endless loop, deadlock, etc. (I believe ideally that should be the case, but this seems to be difficult to judge about, I guess?)

H2CO3 · March 6, 2023, 9:40am

It is documened to be guaranteed, or at least the authors intend to guarantee it. SQLite is one of the best-tested pieces of free software in the world right now, and it's tested (including fuzzing) to ensure that corrupt database files and user errors do not cause random crashes but reported deterministically as errors.

Naturally, there are kinds of corruption that it can't protect against. For example, if the DB file is directly overwritten in just the right place so that a value is changed but it is otherwise valid/looks "correct", then this is impossible to notice in the absence of some other explicit redundancy mechanism (e.g. value/row hashes).

jofas · March 6, 2023, 9:48am

Also an interesting link concerning how you can corrupt an sqlite file: How To Corrupt An SQLite Database File

jbe · March 6, 2023, 9:58am

So then this might be very much suitable for the purposes of the OP.

Which Rust wrapper would you recommend on that matter? I see the Rust Nursery lists rusqlite. Not sure how much up-to-date that site is, though.^[1] Which would you recommend?

The last commit to the master branch of the Rust Nursery has been 2021. ↩︎

H2CO3 · March 6, 2023, 10:01am

Rusqlite is in fact the de-facto standard crate I was referring to. To my knowledge, it is the most popular and best-maintained SQLite wrapper in Rust. (I don't fully agree with all of its design choices, though.)

jofas · March 6, 2023, 10:53am

I like the whole sqlx framework, which also comes with a built-in sqlite3 driver. It is very comprehensible, async and does not require much overhead IMO.

H2CO3 · March 6, 2023, 1:07pm

Here's a minimal example of using SQLite as a key-value store. It has the following features:

It provides upsertion, retrieval (including optional retrieval if the key does not exist), and deletion for arbitrary serializable key and value types.
Keys always have an Eq bound to ensure they are well-behaved.
The API supports the same Borrow-based pattern for keys and values that std's map types apply. Thus, a Collection<String, Vec<u16>> can be created and accessed using &str and &[u16] as well, for example.
Entries can expire; an explicit expiry date can be set via the chrono crate, and a None expiry date means that the given entry never expires
Re-uses serialization/deserialization buffers and creates prepared statements for maximal performance
Currently, keys and values are serialized to JSON. While serde_json is hand-optimized and very fast, encoding can certainly be improved further by means of a binary, compact serialization format, such as bincode, MessagePack, BSON, or CBOR. (These are not used in the example because none of these crates seems to be available in the Playground.)

John_Nagle · March 6, 2023, 5:15pm

It looks like there's some consensus for SQLite. It does have a good reputation for stability and a huge number of users.

This Stack Overflow article indicates it's possible to put blobs into SQLite without JSON encoding them.
Does that work in Rust? Applying JSON encoding to gigabytes of images would slow things down.

H2CO3 · March 6, 2023, 6:22pm

My implementation already stores keys and values as BLOBs, which should be clear from the included SQL. The serialization/deserialization layer is only there to allow arbitrary serializable types in the interface. You don't have to perform the serialization if all you ever have is raw bytes.

Topic		Replies	Views
Something like sqlite, but multithreaded?	7	1780	January 12, 2023
Recommended keyvalue based database help	2	527	December 29, 2020
Rust db tutorial?	2	671	July 28, 2021
How to cache and rotate files in a Rust-based app? help	2	340	July 11, 2023
Announcing Sucredb - a database made of sugar cubes announcements	4	932	January 12, 2023

Recommendations for cache-type database

Related Topics