What's a recommendation for a database for a cache?
Will store items from 100 bytes to 4MB, most around 100KB-200KB.
Database size about 20-50GB.
Don't need ACID properties, but would like a guarantee that, after a crash, whatever is stored is either
intact, or the database is known to be corrupted and must be rebuilt. (It's a cache, after all.)
It's just key/value. No joins or searches.
Will occasionally need to find the oldest items and delete them.
Interesting setup. Can you tell us a bit more about your use-case? A 50GB cache makes me wonder, do you intent to store the cache on-disk (i.e. because your database runs on a different machine and your network latency is too high so that retrieving the cached value from disk is measurably faster)? Or do you have a machine that has that much memory to spare for an in-memory cache? Also, why do you prefer a DB written in Rust over well-established key-value databases not written in Rust, like Redis, Memcached, etcd, Riak, etc.?
It's the content cache for a metaverse client. All the content is fetched from the servers, but has to be cached locally or it takes minutes for a user to log in and get a clear view of the world. Nothing is ever changed; keys are UUIDs and the associated contents never changes. It just gets discarded when stale.
It's definitely an on-disk cache, usually an SSD today.
The current implementation is a directory with tens of thousands of files in it, which means too much time is spent in the OS opening and closing files.
Redis is overkill. Memcached is client/server. Riak is distributed and in Erlang. I don't need anything that complicated. This is purely a local database. It's a relatively minor part of the system.
What I need is like the storage engine inside a caching server such as Ngnix or Varnish, minus the client/server stuff.
I was also looking for solutions for key value stores in Rust a while ago. For my own uses, I created mmtkvdb (which uses LMDB as a backend), but it's using memory mapped files and can exhibit UB if the storage is corrupted (which is why it also requires using unsafe when opening a database). Moreover, it hasn't been thoroughly reviewed.
Re-thinking about this, I don't think that most databases result in UB because of corrupted storage (but I'm not certain). In many cases, they would just abort, I guess (which is still better than UB). But maybe there's also databases around which provide even better error handling on a corrupted state and/or improper API usage.
However, I feel like when you have C APIs, it's rarely documented what happens when certain prerequisites are not met. So I do understand the wish for a pure (safe) Rust solution.
Would SQLite be an option? I know it's not specifically just a key-value store, but it's trivial to use as such. It's in-process, has a de facto Rust crate, it's highly concurrency-safe and hard to corrupt, and it's usually faster than the filesystem for small BLOBs, exactly because it avoids re-opening files for every lookup.
Do you know how SQLite behaves when the on-disks storage is corrupted? Will its API return errors or will it abort the process? I see there is an SQLITE_CORRUPT error code, but I wonder if it's guaranteed that this code or other codes will be returned for all sort of corruption, and that there exists no state of the database, which will lead to an abort of the process or an endless loop, deadlock, etc. (I believe ideally that should be the case, but this seems to be difficult to judge about, I guess?)
It is documened to be guaranteed, or at least the authors intend to guarantee it. SQLite is one of the best-tested pieces of free software in the world right now, and it's tested (including fuzzing) to ensure that corrupt database files and user errors do not cause random crashes but reported deterministically as errors.
Naturally, there are kinds of corruption that it can't protect against. For example, if the DB file is directly overwritten in just the right place so that a value is changed but it is otherwise valid/looks "correct", then this is impossible to notice in the absence of some other explicit redundancy mechanism (e.g. value/row hashes).
Rusqlite is in fact the de-facto standard crate I was referring to. To my knowledge, it is the most popular and best-maintained SQLite wrapper in Rust. (I don't fully agree with all of its design choices, though.)
It provides upsertion, retrieval (including optional retrieval if the key does not exist), and deletion for arbitrary serializable key and value types.
Keys always have an Eq bound to ensure they are well-behaved.
The API supports the same Borrow-based pattern for keys and values that std's map types apply. Thus, a Collection<String, Vec<u16>> can be created and accessed using &str and &[u16] as well, for example.
Entries can expire; an explicit expiry date can be set via the chrono crate, and a None expiry date means that the given entry never expires
Re-uses serialization/deserialization buffers and creates prepared statements for maximal performance
Currently, keys and values are serialized to JSON. While serde_json is hand-optimized and very fast, encoding can certainly be improved further by means of a binary, compact serialization format, such as bincode, MessagePack, BSON, or CBOR. (These are not used in the example because none of these crates seems to be available in the Playground.)
My implementation already stores keys and values as BLOBs, which should be clear from the included SQL. The serialization/deserialization layer is only there to allow arbitrary serializable types in the interface. You don't have to perform the serialization if all you ever have is raw bytes.