There's a really powerful native library I want to use, but it looks like most of its APIs depend on mutating global state under the hood and executing functions in a specific order.
Obviously this makes the library not thread-safe, but the thing which makes it particularly unsafe is how calling the wrong thing at the wrong time may lead to memory leaks or segfaults (e.g. because it uses a global resource and the resource hasn't been initialized yet).
What strategies would you use to create a safe interface to libraries written this way?*
To make things more concrete, this snippet is fairly representative of the way you'd use the library.
use std::{
ffi::CStr,
os::raw::{c_char, c_int},
};
extern "C" {
fn start_item_list() -> c_int;
fn add_single_item(name: *const c_char, value: c_int) -> c_int;
fn end_item_list() -> c_int;
fn start_composite_item(name: *const c_char) -> c_int;
fn add_composite_item_member(name: *const c_char, value: c_int) -> c_int;
fn end_composite_item() -> c_int;
fn do_something_with_list() -> c_int;
fn start_reading_results() -> c_int;
fn get_next_result() -> *const Foo;
fn end_reading_results() -> c_int;
}
#[repr(C)]
struct Foo {
name: *const c_char,
}
const RET_OK: c_int = 0;
fn c_str(raw: &[u8]) -> *const c_char {
CStr::from_bytes_with_nul(raw).unwrap().as_ptr()
}
fn use_library() {
unsafe {
// start populating the list
assert_eq!(start_item_list(), RET_OK);
// add one item
assert_eq!(add_single_item(c_str(b"First\0"), 1), RET_OK);
// start constructing a composite item
assert_eq!(start_composite_item(c_str(b"composite\0")), RET_OK);
// add some items to it
assert_eq!(add_composite_item_member(c_str(b"nested_1\0"), 2), RET_OK);
assert_eq!(add_composite_item_member(c_str(b"nested_2\0"), 3), RET_OK);
// finish constructing the composite item and add it to the list
assert_eq!(end_composite_item(), RET_OK);
// finish populating the list
assert_eq!(end_item_list(), RET_OK);
// and finally we can use the populated list for something
assert_eq!(do_something_with_list(), RET_OK);
// now we can read the results
assert_eq!(start_reading_results(), RET_OK);
loop {
let got = get_next_result();
if got.is_null() {
break;
}
}
// do something with the result
assert_eq!(end_reading_results(), RET_OK);
}
}
If it mutates global states, does it mean you can only instantiate one instance for all time? If so, I would use a std::sync::Once for initialization and creating the struct. Your new would return an Option of course, in case it already has been instantiated.
Next you have to group the correct FFI calls together so they can't be used in the wrong order.
Not sure how you like to handle the object at all. E.g. if it goes out of scope, is it safe that another instance can be created? If yes, the solution with Once wouldn't work, but instead you have to implement Drop yourself and change a global state (a bool for example which states whether the instance can be created or not).
I would wrap the library using a type-level state machine to make misusage simply not compile; If necessary, you can even use the singleton pattern to enforce no concurrency problems (which would be the only part checked at runtime).
mod private {
use ::std::sync::atomic::{self, AtomicBool};
pub
struct Singleton (
pub(self) (),
);
static EXISTS: AtomicBool = AtomicBool::new(false);
impl Singleton {
pub
fn new () -> Option<Self>
{
if (EXISTS.swap(true, atomic::Ordering::Acquire)) {
None
} else {
Some(Self(()))
}
}
}
impl Drop for Singleton {
fn drop (self: &'_ mut Self)
{
EXISTS.store(false, atomic::Ordering::Release);
}
}
pub trait LibState {}
}
pub use private::Singleton;
use private::LibState;
pub struct InitialState {
// ...
}
impl LibState for InitialState {}
// etc.
pub struct Token<State : LibState> {
singleton: Singleton,
state: State,
}
impl Token<InitialState> {
pub
fn new (initial_state: InitialState) // or enough params to construct InitialState
-> Option<Self>
{
Some(Self { singleton: Singleton::new()?, state: initial_state })
}
}
struct SecondState {
// ...
}
impl LibState for SecondState {}
impl Token<InitialState> {
pub
fn transition (self, <params to go from InitialState to SecondState (FFI function params I assume)>)
-> Token<SecondState>
{
let Self { singleton, state } = self;
let second_state = stuff(state, <params>);
Token { singleton, state: second_state }
}
}
Lol. Well to start with, it's a proprietary library and that would break our license agreement... Plus that's more man-years worth of effort than I'm willing to spend.
Imagine under the hood functions pass state to each other via static mut vectors of items. You might call start_item_list() to initialize a buffer (which is a static variable only accessible to the foreign code), then add_single_item() to push() items onto the end of the buffer, then end_item_list() finishes adding items and cleans up afterwards.
If you were to call add_single_item() when your static Vec<Item> hasn't been initialized, it'd probably crash. Likewise calling add_single_item() while midway through a start_composite_item()/end_composite_item() transaction would probably break things... They're the sorts of problems I want to avoid.
I like this approach. It works pretty well for my use case because the library has a very well defined sequence (i.e. first you do the setup, then you process the data, then you read the results), so using a state machine or some sort of Token locking mechanism could work pretty well.