Recommendations for cache-type database

H2CO3 · March 8, 2023, 10:58am

By the way, if you are satisfied with something like a persistent BTreeMap<&[u8], &[u8]>, then you can try sled. It's concurrency-safe, ACID-transactional, and IIUC a pure Rust implementation.

jbe · March 8, 2023, 5:41pm

I had a look at sled. I wonder how its speed would compare to mmtkvdb / LMDB. One potential disadvantage that I see with sled is that transactions require providing an Fn closure which performs the modifications (so that sled can retry the operation in case of conflicts), see example. But since sled is intentionally lock free, I guess there is no way around that.

I'm considering to use sled for some of my use cases, but the Fn closure requirement might be a blocker for me.

John_Nagle · March 8, 2023, 6:00pm

That's about the level of functionality I need. I basically need UUID->BLOB, and not much else.

jslarochelle · March 8, 2023, 6:38pm

One case where SQLite Db can be corrupted is when a hard drive device drivers reports data has being committed, when it is not, in order to improve benchmark. This breaks the "journaling" logic and can cause corruption.
There is nothing much that can be done against this except buy good hardware.

barcharcraz · March 10, 2023, 12:45am

Sqlite is basically a big on-disk b-tree, in theory this shouldn't really be faster than a filesystem that's also a big on-disk b-tree but alas everyone does everything through the filesystem and it's common for something to be slower than you'd like.

You probably want hashes instead of UUIDs btw to allow some deduplication.

You could look at how stuff like casync or ostree is designed.

jbe · March 10, 2023, 4:29pm

I would like to note that even if database is a pure Rust implementation that doesn't use unsafe at all (or where all unsafe use is thoroughly reviewed and sound), then a corrupted database might be similarly severe and difficult to debug as UB in practice (depending on the use-case and the particular implementation of the database), even though UB is formally worse.

This brings me to the following:

I finally remember that I considered sled as well in the past. However, the README says:

if reliability is your primary constraint, use SQLite. sled is beta.

quite young, should be considered unstable for the time being.

the on-disk format is going to change in ways that require manual migrations before the 1.0.0 release!

So maybe SQLite is the better choice for the OP, even if it's not written in Rust. I would expect it to be slower though.

H2CO3 · March 10, 2023, 4:50pm

What is that expectation based on? I'm genuinely curious, as there's nothing standing out in SQLite that would make it inherently slower for this particular use case. It's an indexed B-tree either way, isn't it?

jbe · March 10, 2023, 4:55pm

I haven't used SQLite much, but isn't the interface SQL?^[1] I would expect that it has to parse SQL. Even if that can be done beforehand using prepared statements, the engine would still have to interpret the prepared statement in some way, I guess.

I haven't looked through all its API, maybe there are some lower-level interfaces. ↩︎

H2CO3 · March 10, 2023, 5:03pm

The interpretation's cost is most likely negligible compared to even a single disk access. It compiles SQL to a bytecode-driven state machine and goes through that bytecode. It uses fairly high-level instructions, and for a key-value store, basically only a couple of equality comparisons, index seeks, and jumps will be involved.

jbe · March 10, 2023, 5:07pm

Yeah, maybe the "command interpretation" overhead is neglectible, but not sure. It might depend on how fast the access really is.

I just found some old benchmarks from LMDB.

One more thing I forgot: SQLite does consistency checks and avoids crashing on corrupted databases as you pointed out previously. So this functionality may come with some overhead of course, though it's not really "overhead" as it's part of the desired functionality.

blonk · March 15, 2023, 6:32am

Can you expand on this?

H2CO3 · March 15, 2023, 7:04am

There are a couple of things, some off the top of my head:

they bind the statement and the connection together using a lifetime, rather than using an Arc internally to have all statements keep the connection alive. This makes it impossible to put the connection and statements into the same UDT.
the ToSql/FromSql impls are lacking, eg. no impl FromSql for Cow (which makes some types impossible to use in abstractions built around the library) and they do not always provide as many static guarantees as it would have been possible (eg. f64::NAN implicitly becomes SQL NULL when inserted; this flaw could have been mitigated in the Rust wrapper)
Date and time are unconditionally stored as text, which may be an unnecessary burden on storage (especially because these data types tend to be indexed and as such, duplicated). This could have been configurable.

system · June 13, 2023, 7:05am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Recommendation for cache-type database help	28	785	May 21, 2025
CSV based database, any reviews welcome! code review	13	469	November 13, 2024
Current solutions for key value stores help	30	5138	April 3, 2022
Recommended keyvalue based database help	2	725	December 29, 2020
Help me choose a KV database help	11	400	January 6, 2026

Recommendations for cache-type database

Related topics