See also my previous thread: Current solutions for key value stores
LMDB uses memory-mapping, and my create allows retrieving references to such data on disk (without copying), where alignment is not an issue. Of course, these operations must be
unsafe (to not be unsound) when certain types are involved (e.g.
enums), because a Rust reference must not point to invalid data or uninitialized memory, and data could have been modified by another program if it persists on a file.
So if I keep the feature of directly referencing to data types stored on disk, I will not get rid of
unsafe. I don't see this as a no-go, but I documented that restriction:
Because of how memory-mapped I/O is being used and also because of certain assumptions of the underlying LMDB API, opening environments and databases requires
unsafeRust (i.e. the programmer must ensure that certain preconditions are met that cannot be enforced by the compiler to avoid undefined behavior). If you aim to program in safe Rust only, this Rust library is not suitable for you.)
However, I would like to avoid
unsafe in certain other cases. Let me get to an example:
LMDB offers registering a comparison function, which can be used to collate keys and values:
Set a custom key comparison function for a database.
The comparison function is called whenever it is necessary to compare a key specified by the application with a key currently stored in the database. If no comparison function is specified, and no special key flags were specified with mdb_dbi_open(), the keys are compared lexically, with shorter keys collating before longer keys.
This function must be called before any data access functions are used, otherwise data corruption may occur. The same comparison function must be used by every program accessing the database, every time the database is used.
Now I wonder what really happens if the comparison function changes in a wrong way (which can always happen without the program noticing it, e.g. because you could replace the database files on the file system). In that case "data corruption" may occur. But would that lead to undefined behavior? I feel like the API reference of LMDB isn't specific enough to judge about what "data corruption" means in this context. If it would just affect key-value pairs being lost, for example, this wouldn't be
This problem is similar to changing the comparison function of the type stored in a
It is a logic error for an item to be modified in such a way that the item’s hash, as determined by the
Hashtrait, or its equality, as determined by the
Eqtrait, changes while it is in the set. This is normally only possible through
RefCell, global state, I/O, or unsafe code. The behavior resulting from such a logic error is not specified (it could include panics, incorrect results, aborts, memory leaks, or non-termination) but will not be undefined behavior.
As you can see, the Rust standard library specifically gives a guarantee(!) that the "not specified" behavior (see also unspecified behavior in Wikipedia) will not be "undefined behavior". But API specifications of C libraries might not always be that clear about what happens if you call functions in a wrong way (e.g. set a wrong comparison function in LMDB).
- In the concrete cased of LMDB: Would you assume that calling
mdb_set_comparewith a comparison function that does not match the underlying database will cause undefined behavior (UB)?
- Did you stumble upon other cases of API's written in different languages where it's difficult to judge whether violating certain preconditions causes only certain errors or rather leads to un
specifieddefined behavior (which then would require
unsafeif you cannot ensure the precondition is always true)? How did you deal with these cases?