A Pragmatic Approach To Global State

The other day I needed to wrap a native library that uses global state everywhere... So I decided to do a write-up on the approach I took.

Hopefully future developers who find themselves in similar circumstances will be able to benefit from it :slightly_smiling_face:

13 Likes

Nice article, as usual :slightly_smiling_face:

Regarding the compilation fail tests, I suggest the following:

  • ::trybuild is a very complete library for this, "but" requires us to keep the error messages updated (this can be annoying, but at the same time it prevents the compile_fail having false negatives after, for instance, and edition change that makes some code never compile).

  • A macro for a more readable

    /// ```rust,compile_fail
    /// ...
    

I will present the second one:

macro_rules! compile_fail {
    (@hack
        #[doc = $expr:expr]
        $item:item
    ) => (
        const _: () = {
            #[doc = $expr]
            $item
        };
    );

    (
        $($input:tt)*
    ) => (
        compile_fail! { @hack
            #[doc = concat!(
                "```rust,compile_fail\n",
                stringify! {
                    use ::playground::*; // <-- LIBRARY NAME HERE
                    $($input)*
                },
                "\n",
                "```",
            )]
            pub mod compile_fail_test {}
        }
    );
}

So that you can use it as

compile_fail! {
    // No need to stringify!
    const _: () = 42;
}

There is an extra invariant you are relying on: that the C library does not call the received on_progress callback in parallel (fair assumption, but which ought to be noted). FWIW, to guard against a C library that would call this callback in parallel, since only shared access (&*) would be usable, it would be necessary to either downgrade FnMut to a Fn or to use a Mutex<F> instead of F and mut-upgrade via .lock()ing (and, by the way, when doubting between the two choices, the most ergonomic one for the users is the latter (they get to use FnMut!) but the most performant one is the former (locking happens only when necessary, in the body of the user-provided closure, so that they get to have a Fn closure despite the mutation).

Finally, since the C library may call the callback from within another thread (this is more plausible, as it may be a rendering / executor / whatever thread), there should be an added Send bound on F (and for the paranoid parallel scenario, a Sync bound instead).

3 Likes

Nice writeup!

If the underlying library might segfault when called in the wrong order, though, then I think there's a problem in the bindings. Particularly, if any of the structs like SettingParameters are leaked, then it's possible to just go on and do the next thing without calling the ..._end method. For instance, I could write:

std::mem::forget(library.set_parameters().boolean("x", true));
library.create_recipe();

and then it'll never call stateful_end_setting_parameters. If that can then cause undefined behavior in the underlying library, it's technically an unsound interface. Depending on who your users are, that might be OK - but for publicly released libraries this is generally pretty bad.

It's hard problem to solve in general, too! Rust had a number of std libraries which suffered from it pre-rust-1.0 before the leakpocalypse (the community realizing that leaking values must be accounted for). There are ways to fix the problem, but I'm not sure there's a single best one.

This is an article from around that time which covers what stdlib did to solve it. A similar approach for this library might be having a builder_in_use: bool flag in Library, and to have all Library methods panic if they're called while builder_in_use is set - that way any incorrect leaking is turned into a safe error.

Another option might be to have the builders take self rather than &mut self, so that if they're leaked, there is no way to access the underlying state.

Or, if this is an internal library and making it 100% sound isn't worth it, then documentation noting undefined behavior when leaking builders might be acceptable.


Regardless, I think this is a useful writeup! I like the pragmatic approach - global state in C libraries can be common, and rewriting everything is unreasonable.

4 Likes

One way to solve this (in this case) is to track the state in library with an enum,

enum State {
    Idle, BuildingGroup, BuildingRecipe, ...
}

struct Library {
    state: State,
    ...
}

The state can be checked and set on entry to the builder pattern, and unset on exit from each builder. There is a small cost, but mutating and checking a c-like enum isn't going to be a concern.

@daboross's idea of taking self by value is a bit better in this case because Library is zero-sized, but if the Library is large, then taking the tiny hit of an enum would be a better solution.

for example,

impl Library {
    ...

    pub fn create_recipe(self) -> RecipeBuilder {
        RecipeBuilder { _library: self }
    }
}

pub struct RecipeBuilder {
    _library: Library,
}

impl RecipeBuilder {
    pub fn add_item(self, _name: &str, _value: i32) -> Self {
        ...
    }
    
    pub fn add_group(self, _name: &str) -> GroupBuilder {
        GroupBuilder {
            _recipe_builder: self,
        }
    }
}

pub struct GroupBuilder {
    _recipe_builder: RecipeBuilder,
}

impl GroupBuilder {
    pub fn add_item(self, _name: &str, _value: i32) -> Self {
        ...
    }

    pub fn finish(self) -> RecipeBuilder {
        self._recipe_builder
    }
}

This way you are protected against leaks. The cost is complexity, it is a little harder to make this work with an arbitrary number of items/groups.

let lib = Library::new().unwrap();

let mut rb = lib.create_recipe();

for (item, value) in config.items {
    rb = rb.add_item(item, value); // this is a bit inconvinient
}

for group in config.groups {
    let mut gb = rb.add_group(group.name);
    
    for (item, value) in group.items {
        gb = gb.add_item(item, value); // this is a bit inconvinient
    }
    
    rb = gb.finish(); // this is a bit inconvinient
}

But if you have a known config, then this is fairly nice

Library::new().unwrap()
    .add_item(foo1, bar1)
    .add_item(foo2, bar2)
    .add_group(name1)
        .add_item(foo3, bar3)
        .add_item(foo4, bar4)
        .finish()
    .add_group(name2)
        .add_item(foo5, bar5)
        .add_item(foo6, bar6)
        .finish()
2 Likes

Well that's annoying. I'd forgotten about the whole leakopalypse thing and how you can't really rely on destructors to enforce safety invariants :disappointed:

For my purposes (internal wrapper around a C library) we can make sure mem::forget() never gets called, but if it were a publicly released library I'd probably introduce some sort of poisoning mechanism (builder_in_use) or track the state with an enum and sprinkle assertions throughout the code.

I'd also prefer to panic!() instead of returning a Result because breaking these invariants indicate a programming error which we can't really recover from (there's a good chance the underlying library will be FUBAR). In that case it's better to blow up loudly, forcing the developer to fix their code.

Thanks!

It's fun to rewrite something from scratch, but I've found you tend to actually get things done when building on existing solutions... Which can also be fun in its own right.

2 Likes

A third option that does not require poisoning nor passing a big self by value is to offer an API using a closure and have the cleanup code be part of the "call the closure epilogue" rather than a Drop impl:

Library::new().unwrap()
    .add_item(...)
    .with_group(name1, |group| { // group: &'_ mut Group so no forget interaction whatsoever
        group.add_item(...);
        ...
    })

where

impl Thingy {
    pub(self) // private (or make it public and `unsafe`, requiring the `.finish()` call)
    fn add_group (&mut self, <args>) -> GroupThingy // thing with safety-critical .finish() call
    { ... }

    pub
    fn with_group (
        self: &'_ mut Self,
        <args>,
        f: impl FnOnce(&'_ mut GroupThingy),
    ) -> &'_ mut Self
    {
        use ::std::panic::{self, AssertUnwindSafe}; // closure may panic

        let mut group_thingy = self.add_group(<args>);
        let mb_panic = panic::catch_unwind(AssertUnwindSafe(|| f(&mut group_thingy)));
        let ret = group_thingy.finish();
        if let Err(panic) = mb_panic { panic::resume_unwind(panic); }
        ret
    }
}
3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.