You can run Redis on machine with enough swap space. It would be slow as heck if you need in memory cache but it doesn't seems so.
Do you know how SQLite behaves when the on-disks storage is corrupted? Will its API return errors or will it abort the process? I see there is an SQLITE_CORRUPT error code, but I wonder if it's guaranteed that this code or other codes will be returned for all sort of corruption, and that there exists no state of the database, which will lead to an abort of the process or an endless loop, deadlock, etc. (I believe ideally that should be the case, but this seems to be difficult to judge about, I guess?)
It is documened to be guaranteed, or at least the authors intend to guarantee it. SQLite is one of the best-tested pieces of free software in the world right now, and it's tested (including fuzzing) to ensure that corrupt database files and user errors do not cause random crashes but reported deterministically as errors.
Naturally, there are kinds of corruption that it can't protect against. For example, if the DB file is directly overwritten in just the right place so that a value is changed but it is otherwise valid/looks "correct", then this is impossible to notice in the absence of some other explicit redundancy mechanism (e.g. value/row hashes).
Also an interesting link concerning how you can corrupt an sqlite file: How To Corrupt An SQLite Database File
So then this might be very much suitable for the purposes of the OP.
Which Rust wrapper would you recommend on that matter? I see the Rust Nursery lists rusqlite
. Not sure how much up-to-date that site is, though.[1] Which would you recommend?
-
The last commit to the master branch of the Rust Nursery has been 2021. ↩︎
Rusqlite is in fact the de-facto standard crate I was referring to. To my knowledge, it is the most popular and best-maintained SQLite wrapper in Rust. (I don't fully agree with all of its design choices, though.)
I like the whole sqlx
framework, which also comes with a built-in sqlite3 driver. It is very comprehensible, async and does not require much overhead IMO.
Here's a minimal example of using SQLite as a key-value store. It has the following features:
- It provides upsertion, retrieval (including optional retrieval if the key does not exist), and deletion for arbitrary serializable key and value types.
- Keys always have an
Eq
bound to ensure they are well-behaved. - The API supports the same
Borrow
-based pattern for keys and values thatstd
's map types apply. Thus, aCollection<String, Vec<u16>>
can be created and accessed using&str
and&[u16]
as well, for example. - Entries can expire; an explicit expiry date can be set via the
chrono
crate, and aNone
expiry date means that the given entry never expires - Re-uses serialization/deserialization buffers and creates prepared statements for maximal performance
- Currently, keys and values are serialized to JSON. While
serde_json
is hand-optimized and very fast, encoding can certainly be improved further by means of a binary, compact serialization format, such as bincode, MessagePack, BSON, or CBOR. (These are not used in the example because none of these crates seems to be available in the Playground.)
It looks like there's some consensus for SQLite. It does have a good reputation for stability and a huge number of users.
This Stack Overflow article indicates it's possible to put blobs into SQLite without JSON encoding them.
Does that work in Rust? Applying JSON encoding to gigabytes of images would slow things down.
My implementation already stores keys and values as BLOBs, which should be clear from the included SQL. The serialization/deserialization layer is only there to allow arbitrary serializable types in the interface. You don't have to perform the serialization if all you ever have is raw bytes.
By the way, if you are satisfied with something like a persistent BTreeMap<&[u8], &[u8]>
, then you can try sled. It's concurrency-safe, ACID-transactional, and IIUC a pure Rust implementation.
I had a look at sled
. I wonder how its speed would compare to mmtkvdb
/ LMDB. One potential disadvantage that I see with sled
is that transactions require providing an Fn
closure which performs the modifications (so that sled
can retry the operation in case of conflicts), see example. But since sled
is intentionally lock free, I guess there is no way around that.
I'm considering to use sled
for some of my use cases, but the Fn
closure requirement might be a blocker for me.
That's about the level of functionality I need. I basically need UUID->BLOB, and not much else.
One case where SQLite Db can be corrupted is when a hard drive device drivers reports data has being committed, when it is not, in order to improve benchmark. This breaks the "journaling" logic and can cause corruption.
There is nothing much that can be done against this except buy good hardware.
Sqlite is basically a big on-disk b-tree, in theory this shouldn't really be faster than a filesystem that's also a big on-disk b-tree but alas everyone does everything through the filesystem and it's common for something to be slower than you'd like.
You probably want hashes instead of UUIDs btw to allow some deduplication.
You could look at how stuff like casync or ostree is designed.
I would like to note that even if database is a pure Rust implementation that doesn't use unsafe
at all (or where all unsafe
use is thoroughly reviewed and sound), then a corrupted database might be similarly severe and difficult to debug as UB in practice (depending on the use-case and the particular implementation of the database), even though UB is formally worse.
This brings me to the following:
I finally remember that I considered sled
as well in the past. However, the README says:
- if reliability is your primary constraint, use SQLite. sled is beta.
- quite young, should be considered unstable for the time being.
- the on-disk format is going to change in ways that require manual migrations before the
1.0.0
release!
So maybe SQLite is the better choice for the OP, even if it's not written in Rust. I would expect it to be slower though.
What is that expectation based on? I'm genuinely curious, as there's nothing standing out in SQLite that would make it inherently slower for this particular use case. It's an indexed B-tree either way, isn't it?
I haven't used SQLite much, but isn't the interface SQL?[1] I would expect that it has to parse SQL. Even if that can be done beforehand using prepared statements, the engine would still have to interpret the prepared statement in some way, I guess.
-
I haven't looked through all its API, maybe there are some lower-level interfaces. ↩︎
The interpretation's cost is most likely negligible compared to even a single disk access. It compiles SQL to a bytecode-driven state machine and goes through that bytecode. It uses fairly high-level instructions, and for a key-value store, basically only a couple of equality comparisons, index seeks, and jumps will be involved.
Yeah, maybe the "command interpretation" overhead is neglectible, but not sure. It might depend on how fast the access really is.
I just found some old benchmarks from LMDB.
One more thing I forgot: SQLite does consistency checks and avoids crashing on corrupted databases as you pointed out previously. So this functionality may come with some overhead of course, though it's not really "overhead" as it's part of the desired functionality.