I am trying to wrap the SQLite3 library in Rust. (I am aware of the many great bindings already available for SQLite – this is more of an exercise, but I also do have legitimate reasons for not using e.g. rusqlite
in this particular project of mine.)
For anyone not familiar with SQLite, here's a brief description of its basic API.
You can open a database connection, *mut sqlite3
, and then "prepare" (compile from SQL to optimized bytecode) one or more statements, *mut sqlite3_stmt
. Finally, for executing such a prepared statement, one repeatedly invokes a stepper function, sqlite3_step()
, on the prepared statement object, which retrieves the results of the query row-by-row, effectively invoking the fetch-decode-execute loop of an underlying virtual machine interpreter.
While stepping through the results, a prepared statement will mutate the data in the database connection it was created from (in addition to mutating its own fields, of course). For example, it will access and populate the page cache, set a per-connection error code and error message, and so forth. Since multiple prepared statements are allowed to exist and execute at the same time in the context of a given database connection, the library effectively exhibits shared interior mutability.
For now, I am only interested in dealing with the DB connection and its corresponding prepared statements. Now, it is quite straightforward to wrap both this database handle and its prepared statement objects into their own respective RAII structs. I am not sure, however, what the proper model would be for ensuring that the usage of the wrapper is race-free and in general, memory-safe.
The only hard-ish requirement I have is that I would like the prepared statement wrappers to be Send
. Since the prepared statements themselves contain pointers to the database handle, I assume this means that the database handle itself has to be Sync
.
Now, SQLite can be configured with two levels of thread-safety:
- A weaker "multi-threaded" mode, where a given DB connection can only be accessed by a single thread at a time. (This is only called multi-threaded because the library also manages some unrelated, global, internal state which is synchronized in this mode, so their accessor functions are thread-safe.)
- A stronger "serialized" mode, where a DB connection can be accessed by many threads at the same time, because all calls are synchronized by a per-connection mutex internal to the library.
Therefore, one can assume that the DB connection can effectively be eitherSend + !Sync
orSend + Sync
.
What I'm not sure about is three things:
- How much synchronization do I actually need, and where? Which types can/should I mark as
Send
/Sync
? - Should my mutating methods take
&self
or&mut self
? Precisely what assumptions does Rust make about the aliasing mutability of externally-observable data behind a raw pointer? When I write "externally-observable", I mean that there are getter functions in the FFI with which some of the aforementioned connection-local state, e.g. the error message or cached contents, can be publicly accessed. - How does all of this interact with single-threaded concurrency, i.e. the RWLock pattern and borrowck?
The approach I was thinking of looks roughly like this:
- I would compile SQLite in "multi-threaded" mode, i.e. no locks on the database handle. As I understand, this would effectively make the database handle
Send + !Sync
. - I would write
unsafe impl Send for Database {}
, accordingly. - I would make all mutating methods on both the
Database
and theStatement
object take&mut self
. - In the
Statement
object, I would store anArc<Mutex<Database>>
. - I would also
unsafe impl Send for Statement {}
. - Whenever I wanted to execute a prepared statement, I would acquire the lock on the database handle via the statement object, step through the results, and release the lock.
My ultimate question is: Is the approach described above memory-safe? I'm pretty sure the multi-threaded synchronization part of my plan is correct, although I still don't know:
- Whether it is a good idea (necessary or allowed) to make the mutating functions take
&mut self
, since there is aliasing going on in the internals of SQLite-the-C-library; - Whether I am allowed to mark both types
Send
, and whether I am allowed to additionally markStatement
asSync
; and - Whether there is a better overall way of wrapping prepared statements, in order to minimize the scope of unsafety, and maybe let the compiler infer
Send
andSync
impls in more cases?