I have been writing a crate (mmtkvdb
, memory mapped typed key-value database) to provide a key-value store using the Symas LMDB library (technical docs here).
See also my previous thread: Current solutions for key value stores
LMDB uses memory-mapping, and my create allows retrieving references to such data on disk (without copying), where alignment is not an issue. Of course, these operations must be unsafe
(to not be unsound) when certain types are involved (e.g. bool
s or enum
s), because a Rust reference must not point to invalid data or uninitialized memory, and data could have been modified by another program if it persists on a file.
So if I keep the feature of directly referencing to data types stored on disk, I will not get rid of unsafe
. I don't see this as a no-go, but I documented that restriction:
Safety
Because of how memory-mapped I/O is being used and also because of certain assumptions of the underlying LMDB API, opening environments and databases requires
unsafe
Rust (i.e. the programmer must ensure that certain preconditions are met that cannot be enforced by the compiler to avoid undefined behavior). If you aim to program in safe Rust only, this Rust library is not suitable for you.)
However, I would like to avoid unsafe
in certain other cases. Let me get to an example:
LMDB offers registering a comparison function, which can be used to collate keys and values:
int mdb_set_compare ( MDB_txn * txn , MDB_dbi dbi , MDB_cmp_func * cmp ) Set a custom key comparison function for a database.
The comparison function is called whenever it is necessary to compare a key specified by the application with a key currently stored in the database. If no comparison function is specified, and no special key flags were specified with mdb_dbi_open(), the keys are compared lexically, with shorter keys collating before longer keys.
Warning
This function must be called before any data access functions are used, otherwise data corruption may occur. The same comparison function must be used by every program accessing the database, every time the database is used.
Now I wonder what really happens if the comparison function changes in a wrong way (which can always happen without the program noticing it, e.g. because you could replace the database files on the file system). In that case "data corruption" may occur. But would that lead to undefined behavior? I feel like the API reference of LMDB isn't specific enough to judge about what "data corruption" means in this context. If it would just affect key-value pairs being lost, for example, this wouldn't be unsafe
.
This problem is similar to changing the comparison function of the type stored in a HashSet
:
It is a logic error for an item to be modified in such a way that the item’s hash, as determined by the
Hash
trait, or its equality, as determined by theEq
trait, changes while it is in the set. This is normally only possible throughCell
,RefCell
, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified (it could include panics, incorrect results, aborts, memory leaks, or non-termination) but will not be undefined behavior.
As you can see, the Rust standard library specifically gives a guarantee(!) that the "not specified" behavior (see also unspecified behavior in Wikipedia) will not be "undefined behavior". But API specifications of C libraries might not always be that clear about what happens if you call functions in a wrong way (e.g. set a wrong comparison function in LMDB).
Some questions:
- In the concrete cased of LMDB: Would you assume that calling
mdb_set_compare
with a comparison function that does not match the underlying database will cause undefined behavior (UB)? - Did you stumble upon other cases of API's written in different languages where it's difficult to judge whether violating certain preconditions causes only certain errors or rather leads to un
specifieddefined behavior (which then would requireunsafe
if you cannot ensure the precondition is always true)? How did you deal with these cases?