Thread-safely wrapping C FFI library with effective interior mutability

I am trying to wrap the SQLite3 library in Rust. (I am aware of the many great bindings already available for SQLite – this is more of an exercise, but I also do have legitimate reasons for not using e.g. rusqlite in this particular project of mine.)

For anyone not familiar with SQLite, here's a brief description of its basic API.

You can open a database connection, *mut sqlite3, and then "prepare" (compile from SQL to optimized bytecode) one or more statements, *mut sqlite3_stmt. Finally, for executing such a prepared statement, one repeatedly invokes a stepper function, sqlite3_step(), on the prepared statement object, which retrieves the results of the query row-by-row, effectively invoking the fetch-decode-execute loop of an underlying virtual machine interpreter.

While stepping through the results, a prepared statement will mutate the data in the database connection it was created from (in addition to mutating its own fields, of course). For example, it will access and populate the page cache, set a per-connection error code and error message, and so forth. Since multiple prepared statements are allowed to exist and execute at the same time in the context of a given database connection, the library effectively exhibits shared interior mutability.

For now, I am only interested in dealing with the DB connection and its corresponding prepared statements. Now, it is quite straightforward to wrap both this database handle and its prepared statement objects into their own respective RAII structs. I am not sure, however, what the proper model would be for ensuring that the usage of the wrapper is race-free and in general, memory-safe.

The only hard-ish requirement I have is that I would like the prepared statement wrappers to be Send. Since the prepared statements themselves contain pointers to the database handle, I assume this means that the database handle itself has to be Sync.

Now, SQLite can be configured with two levels of thread-safety:

  • A weaker "multi-threaded" mode, where a given DB connection can only be accessed by a single thread at a time. (This is only called multi-threaded because the library also manages some unrelated, global, internal state which is synchronized in this mode, so their accessor functions are thread-safe.)
  • A stronger "serialized" mode, where a DB connection can be accessed by many threads at the same time, because all calls are synchronized by a per-connection mutex internal to the library.
    Therefore, one can assume that the DB connection can effectively be either Send + !Sync or Send + Sync.

What I'm not sure about is three things:

  1. How much synchronization do I actually need, and where? Which types can/should I mark as Send/Sync?
  2. Should my mutating methods take &self or &mut self? Precisely what assumptions does Rust make about the aliasing mutability of externally-observable data behind a raw pointer? When I write "externally-observable", I mean that there are getter functions in the FFI with which some of the aforementioned connection-local state, e.g. the error message or cached contents, can be publicly accessed.
  3. How does all of this interact with single-threaded concurrency, i.e. the RWLock pattern and borrowck?

The approach I was thinking of looks roughly like this:

  • I would compile SQLite in "multi-threaded" mode, i.e. no locks on the database handle. As I understand, this would effectively make the database handle Send + !Sync.
  • I would write unsafe impl Send for Database {}, accordingly.
  • I would make all mutating methods on both the Database and the Statement object take &mut self.
  • In the Statement object, I would store an Arc<Mutex<Database>>.
  • I would also unsafe impl Send for Statement {}.
  • Whenever I wanted to execute a prepared statement, I would acquire the lock on the database handle via the statement object, step through the results, and release the lock.

My ultimate question is: Is the approach described above memory-safe? I'm pretty sure the multi-threaded synchronization part of my plan is correct, although I still don't know:

  • Whether it is a good idea (necessary or allowed) to make the mutating functions take &mut self, since there is aliasing going on in the internals of SQLite-the-C-library;
  • Whether I am allowed to mark both types Send, and whether I am allowed to additionally mark Statement as Sync; and
  • Whether there is a better overall way of wrapping prepared statements, in order to minimize the scope of unsafety, and maybe let the compiler infer Send and Sync impls in more cases?
1 Like

A type is Sync if and only if immutable references to it are Send. Smart pointers like Arc that provide immutable access are similarly Send if and only if the underlying type is Sync. So you should mark the type Sync if you want an Arc containing that type to be Send.

Note that Mutex<T> is Sync if T is Send, so an Arc<Mutex<T>> is Send as long as T is Send.

If a type is Sync, then:

  1. Any method that takes &self must be safe to call from multiple threads in parallel.
  2. Any method that takes &mut self does not have to be safe to call from multiple threads in parallel.

If a type is not Sync, then neither kind of method can be called in parallel from multiple threads. In this case, the only difference between the two kinds of methods is that a &self method must correctly follow the aliasing rules described below. For example, a type like Cell is Send but not Sync because it is not safe to call some of its &self methods from multiple threads.

If you have an &mut *mut T, &mut *const T, &*mut T, or &*const T, then this makes no assumptions about the T whatsoever. Whenever you create a Rust reference to data behind the reference, it makes the following assumptions:

  • Immutable reference. Between any two uses of the same immutable reference, the data it points at is unchanged.
  • Mutable reference. Between any two uses of the same mutable reference, the data it points at has not been accessed by any other pointer or reference (besides pointers or references derived from the mutable reference).

There are a few more rules, but they only matter if you create a raw pointer from a Rust reference, and keep that raw pointer around after the Rust reference no longer exists. Note that the creation of a Rust reference counts as a use, but a reference going out of scope does not count as a use. Any pointer in C code behaves like a Rust raw pointer when it comes to these rules.

A note regarding UnsafeCell: The raw pointer serves the same purpose as an UnsafeCell, in the sense that if the data inside an UnsafeCell<T> is changed, then an immutable reference does not consider the UnsafeCell itself to have been modified. The same applies to raw pointers in that if you modify data behind a raw pointer, an immutable reference would not consider this to have been a change of the raw pointer.


I think the above should answer most of your questions. Please let me know which ones you are still unsure about, and I will be happy to elaborate.

2 Likes

Thanks, this was very helpful, especially the clarification of the aliasing rules and the equivalence between raw pointers and UnsafeCell. I did not realize that they behaved identically in terms of aliasing rules. In fact, I was going to ask "how do I communicate interior mutability to the compiler given that I can't put the opaque library type behind an UnsafeCell?", but in the end I forgot.

I'm assuming you mean "data behind the raw pointer" here?

Other than that, I'm only having trouble interpreting the following statement:

Doesn't taking &mut self protect against this automatically and statically? I thought that &mut references were necessarily unique (modulo mutable aliasing bugs), so if I had either a Sync or a !Sync type with a &mut method, the compiler would prevent me from calling it in multiple threads at the same time.

You are correct. Note that I said "neither" and not "either".

1 Like