By the way, if you are satisfied with something like a persistent BTreeMap<&[u8], &[u8]>
, then you can try sled. It's concurrency-safe, ACID-transactional, and IIUC a pure Rust implementation.
I had a look at sled
. I wonder how its speed would compare to mmtkvdb
/ LMDB. One potential disadvantage that I see with sled
is that transactions require providing an Fn
closure which performs the modifications (so that sled
can retry the operation in case of conflicts), see example. But since sled
is intentionally lock free, I guess there is no way around that.
I'm considering to use sled
for some of my use cases, but the Fn
closure requirement might be a blocker for me.
That's about the level of functionality I need. I basically need UUID->BLOB, and not much else.
One case where SQLite Db can be corrupted is when a hard drive device drivers reports data has being committed, when it is not, in order to improve benchmark. This breaks the "journaling" logic and can cause corruption.
There is nothing much that can be done against this except buy good hardware.
Sqlite is basically a big on-disk b-tree, in theory this shouldn't really be faster than a filesystem that's also a big on-disk b-tree but alas everyone does everything through the filesystem and it's common for something to be slower than you'd like.
You probably want hashes instead of UUIDs btw to allow some deduplication.
You could look at how stuff like casync or ostree is designed.
I would like to note that even if database is a pure Rust implementation that doesn't use unsafe
at all (or where all unsafe
use is thoroughly reviewed and sound), then a corrupted database might be similarly severe and difficult to debug as UB in practice (depending on the use-case and the particular implementation of the database), even though UB is formally worse.
This brings me to the following:
I finally remember that I considered sled
as well in the past. However, the README says:
- if reliability is your primary constraint, use SQLite. sled is beta.
- quite young, should be considered unstable for the time being.
- the on-disk format is going to change in ways that require manual migrations before the
1.0.0
release!
So maybe SQLite is the better choice for the OP, even if it's not written in Rust. I would expect it to be slower though.
What is that expectation based on? I'm genuinely curious, as there's nothing standing out in SQLite that would make it inherently slower for this particular use case. It's an indexed B-tree either way, isn't it?
I haven't used SQLite much, but isn't the interface SQL?[1] I would expect that it has to parse SQL. Even if that can be done beforehand using prepared statements, the engine would still have to interpret the prepared statement in some way, I guess.
-
I haven't looked through all its API, maybe there are some lower-level interfaces. âŠī¸
The interpretation's cost is most likely negligible compared to even a single disk access. It compiles SQL to a bytecode-driven state machine and goes through that bytecode. It uses fairly high-level instructions, and for a key-value store, basically only a couple of equality comparisons, index seeks, and jumps will be involved.
Yeah, maybe the "command interpretation" overhead is neglectible, but not sure. It might depend on how fast the access really is.
I just found some old benchmarks from LMDB.
One more thing I forgot: SQLite does consistency checks and avoids crashing on corrupted databases as you pointed out previously. So this functionality may come with some overhead of course, though it's not really "overhead" as it's part of the desired functionality.
Can you expand on this?
There are a couple of things, some off the top of my head:
- they bind the statement and the connection together using a lifetime, rather than using an
Arc
internally to have all statements keep the connection alive. This makes it impossible to put the connection and statements into the same UDT. - the
ToSql
/FromSql
impls are lacking, eg. noimpl FromSql for Cow
(which makes some types impossible to use in abstractions built around the library) and they do not always provide as many static guarantees as it would have been possible (eg.f64::NAN
implicitly becomes SQLNULL
when inserted; this flaw could have been mitigated in the Rust wrapper) - Date and time are unconditionally stored as text, which may be an unnecessary burden on storage (especially because these data types tend to be indexed and as such, duplicated). This could have been configurable.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.