Recommendations for cache-type database

You can run Redis on machine with enough swap space. It would be slow as heck if you need in memory cache but it doesn't seems so.

Do you know how SQLite behaves when the on-disks storage is corrupted? Will its API return errors or will it abort the process? I see there is an SQLITE_CORRUPT error code, but I wonder if it's guaranteed that this code or other codes will be returned for all sort of corruption, and that there exists no state of the database, which will lead to an abort of the process or an endless loop, deadlock, etc. (I believe ideally that should be the case, but this seems to be difficult to judge about, I guess?)

It is documened to be guaranteed, or at least the authors intend to guarantee it. SQLite is one of the best-tested pieces of free software in the world right now, and it's tested (including fuzzing) to ensure that corrupt database files and user errors do not cause random crashes but reported deterministically as errors.

Naturally, there are kinds of corruption that it can't protect against. For example, if the DB file is directly overwritten in just the right place so that a value is changed but it is otherwise valid/looks "correct", then this is impossible to notice in the absence of some other explicit redundancy mechanism (e.g. value/row hashes).

3 Likes

Also an interesting link concerning how you can corrupt an sqlite file: How To Corrupt An SQLite Database File

3 Likes

So then this might be very much suitable for the purposes of the OP.

Which Rust wrapper would you recommend on that matter? I see the Rust Nursery lists rusqlite. Not sure how much up-to-date that site is, though.[1] Which would you recommend?


  1. The last commit to the master branch of the Rust Nursery has been 2021. ↩︎

1 Like

Rusqlite is in fact the de-facto standard crate I was referring to. To my knowledge, it is the most popular and best-maintained SQLite wrapper in Rust. (I don't fully agree with all of its design choices, though.)

1 Like

I like the whole sqlx framework, which also comes with a built-in sqlite3 driver. It is very comprehensible, async and does not require much overhead IMO.

1 Like

Here's a minimal example of using SQLite as a key-value store. It has the following features:

  • It provides upsertion, retrieval (including optional retrieval if the key does not exist), and deletion for arbitrary serializable key and value types.
  • Keys always have an Eq bound to ensure they are well-behaved.
  • The API supports the same Borrow-based pattern for keys and values that std's map types apply. Thus, a Collection<String, Vec<u16>> can be created and accessed using &str and &[u16] as well, for example.
  • Entries can expire; an explicit expiry date can be set via the chrono crate, and a None expiry date means that the given entry never expires
  • Re-uses serialization/deserialization buffers and creates prepared statements for maximal performance
  • Currently, keys and values are serialized to JSON. While serde_json is hand-optimized and very fast, encoding can certainly be improved further by means of a binary, compact serialization format, such as bincode, MessagePack, BSON, or CBOR. (These are not used in the example because none of these crates seems to be available in the Playground.)
3 Likes

It looks like there's some consensus for SQLite. It does have a good reputation for stability and a huge number of users.

This Stack Overflow article indicates it's possible to put blobs into SQLite without JSON encoding them.
Does that work in Rust? Applying JSON encoding to gigabytes of images would slow things down.

2 Likes

My implementation already stores keys and values as BLOBs, which should be clear from the included SQL. The serialization/deserialization layer is only there to allow arbitrary serializable types in the interface. You don't have to perform the serialization if all you ever have is raw bytes.

By the way, if you are satisfied with something like a persistent BTreeMap<&[u8], &[u8]>, then you can try sled. It's concurrency-safe, ACID-transactional, and IIUC a pure Rust implementation.

1 Like

I had a look at sled. I wonder how its speed would compare to mmtkvdb / LMDB. One potential disadvantage that I see with sled is that transactions require providing an Fn closure which performs the modifications (so that sled can retry the operation in case of conflicts), see example. But since sled is intentionally lock free, I guess there is no way around that.

I'm considering to use sled for some of my use cases, but the Fn closure requirement might be a blocker for me.

That's about the level of functionality I need. I basically need UUID->BLOB, and not much else.

1 Like

One case where SQLite Db can be corrupted is when a hard drive device drivers reports data has being committed, when it is not, in order to improve benchmark. This breaks the "journaling" logic and can cause corruption.
There is nothing much that can be done against this except buy good hardware.

2 Likes

Sqlite is basically a big on-disk b-tree, in theory this shouldn't really be faster than a filesystem that's also a big on-disk b-tree but alas everyone does everything through the filesystem and it's common for something to be slower than you'd like.

You probably want hashes instead of UUIDs btw to allow some deduplication.

You could look at how stuff like casync or ostree is designed.

1 Like

I would like to note that even if database is a pure Rust implementation that doesn't use unsafe at all (or where all unsafe use is thoroughly reviewed and sound), then a corrupted database might be similarly severe and difficult to debug as UB in practice (depending on the use-case and the particular implementation of the database), even though UB is formally worse.

This brings me to the following:

I finally remember that I considered sled as well in the past. However, the README says:

  • if reliability is your primary constraint, use SQLite. sled is beta.
  • quite young, should be considered unstable for the time being.
  • the on-disk format is going to change in ways that require manual migrations before the 1.0.0 release!

So maybe SQLite is the better choice for the OP, even if it's not written in Rust. I would expect it to be slower though.

What is that expectation based on? I'm genuinely curious, as there's nothing standing out in SQLite that would make it inherently slower for this particular use case. It's an indexed B-tree either way, isn't it?

I haven't used SQLite much, but isn't the interface SQL?[1] I would expect that it has to parse SQL. Even if that can be done beforehand using prepared statements, the engine would still have to interpret the prepared statement in some way, I guess.


  1. I haven't looked through all its API, maybe there are some lower-level interfaces. ↩︎

The interpretation's cost is most likely negligible compared to even a single disk access. It compiles SQL to a bytecode-driven state machine and goes through that bytecode. It uses fairly high-level instructions, and for a key-value store, basically only a couple of equality comparisons, index seeks, and jumps will be involved.

Yeah, maybe the "command interpretation" overhead is neglectible, but not sure. It might depend on how fast the access really is.

I just found some old benchmarks from LMDB.


One more thing I forgot: SQLite does consistency checks and avoids crashing on corrupted databases as you pointed out previously. So this functionality may come with some overhead of course, though it's not really "overhead" as it's part of the desired functionality.