Common strategies for wrapping a library that uses globals everywhere?

There's a really powerful native library I want to use, but it looks like most of its APIs depend on mutating global state under the hood and executing functions in a specific order.

Obviously this makes the library not thread-safe, but the thing which makes it particularly unsafe is how calling the wrong thing at the wrong time may lead to memory leaks or segfaults (e.g. because it uses a global resource and the resource hasn't been initialized yet).

What strategies would you use to create a safe interface to libraries written this way?*

To make things more concrete, this snippet is fairly representative of the way you'd use the library.

use std::{
    ffi::CStr,
    os::raw::{c_char, c_int},
};

extern "C" {
    fn start_item_list() -> c_int;
    fn add_single_item(name: *const c_char, value: c_int) -> c_int;
    fn end_item_list() -> c_int;
    fn start_composite_item(name: *const c_char) -> c_int;
    fn add_composite_item_member(name: *const c_char, value: c_int) -> c_int;
    fn end_composite_item() -> c_int;

    fn do_something_with_list() -> c_int;
    fn start_reading_results() -> c_int;
    fn get_next_result() -> *const Foo;
    fn end_reading_results() -> c_int;
}

#[repr(C)]
struct Foo {
    name: *const c_char,
}

const RET_OK: c_int = 0;

fn c_str(raw: &[u8]) -> *const c_char {
    CStr::from_bytes_with_nul(raw).unwrap().as_ptr()
}

fn use_library() {
    unsafe {
        // start populating the list
        assert_eq!(start_item_list(), RET_OK);

        // add one item
        assert_eq!(add_single_item(c_str(b"First\0"), 1), RET_OK);

        // start constructing a composite item
        assert_eq!(start_composite_item(c_str(b"composite\0")), RET_OK);
        // add some items to it
        assert_eq!(add_composite_item_member(c_str(b"nested_1\0"), 2), RET_OK);
        assert_eq!(add_composite_item_member(c_str(b"nested_2\0"), 3), RET_OK);
        // finish constructing the composite item and add it to the list
        assert_eq!(end_composite_item(), RET_OK);

        // finish populating the list
        assert_eq!(end_item_list(), RET_OK);

        // and finally we can use the populated list for something
        assert_eq!(do_something_with_list(), RET_OK);

        // now we can read the results
        assert_eq!(start_reading_results(), RET_OK);

        loop {
            let got = get_next_result();
            if got.is_null() {
                break;
            }
        }

        // do something with the result
        assert_eq!(end_reading_results(), RET_OK);
    }
}

Why not RIIR?

1 Like

Seriously:

If it mutates global states, does it mean you can only instantiate one instance for all time? If so, I would use a std::sync::Once for initialization and creating the struct. Your new would return an Option of course, in case it already has been instantiated.

Next you have to group the correct FFI calls together so they can't be used in the wrong order.

Not sure how you like to handle the object at all. E.g. if it goes out of scope, is it safe that another instance can be created? If yes, the solution with Once wouldn't work, but instead you have to implement Drop yourself and change a global state (a bool for example which states whether the instance can be created or not).

Does that help you?

2 Likes

I would wrap the library using a type-level state machine to make misusage simply not compile; If necessary, you can even use the singleton pattern to enforce no concurrency problems (which would be the only part checked at runtime).

mod private {
    use ::std::sync::atomic::{self, AtomicBool};

    pub
    struct Singleton (
        pub(self) (),
    );

    static EXISTS: AtomicBool = AtomicBool::new(false);

    impl Singleton {
        pub
        fn new () -> Option<Self>
        {
            if (EXISTS.swap(true, atomic::Ordering::Acquire)) {
                None
            } else {
                Some(Self(()))
            }
        }
    }

    impl Drop for Singleton {
        fn drop (self: &'_ mut Self)
        {
            EXISTS.store(false, atomic::Ordering::Release);
        }
    }
    pub trait LibState {}
}
pub use private::Singleton;
use private::LibState;

pub struct InitialState {
    // ...
}
impl LibState for InitialState {}

// etc.

pub struct Token<State : LibState> {
    singleton: Singleton,
    state: State,
}

impl Token<InitialState> {
    pub
    fn new (initial_state: InitialState) // or enough params to construct InitialState
      -> Option<Self>
    {
         Some(Self { singleton: Singleton::new()?, state: initial_state })
    }
}

struct SecondState {
    // ...
}
impl LibState for SecondState {}

impl Token<InitialState> {
    pub
    fn transition (self, <params to go from InitialState to SecondState (FFI function params I assume)>)
      -> Token<SecondState>
    {
        let Self { singleton, state } = self;
        let second_state = stuff(state, <params>);
        Token { singleton, state: second_state }
    }
}

etc.

2 Likes

Nice article on this pattern here:

http://cliffle.com/blog/rust-typestate/

1 Like
  • You could have Library<State> where each function takes self and returns self transmuted to the new state.
    impl Library<Unfrobnicated> { 
       fn frobnicate(self) -> Library<Frobnicated> {…}
    }
    
  • You could add runtime state checks on the Rust side, and ignore wrong calls or return Result
1 Like

Lol. Well to start with, it's a proprietary library and that would break our license agreement... Plus that's more man-years worth of effort than I'm willing to spend.

Imagine under the hood functions pass state to each other via static mut vectors of items. You might call start_item_list() to initialize a buffer (which is a static variable only accessible to the foreign code), then add_single_item() to push() items onto the end of the buffer, then end_item_list() finishes adding items and cleans up afterwards.

If you were to call add_single_item() when your static Vec<Item> hasn't been initialized, it'd probably crash. Likewise calling add_single_item() while midway through a start_composite_item()/end_composite_item() transaction would probably break things... They're the sorts of problems I want to avoid.

I like this approach. It works pretty well for my use case because the library has a very well defined sequence (i.e. first you do the setup, then you process the data, then you read the results), so using a state machine or some sort of Token locking mechanism could work pretty well.

1 Like