Soundly dealing with FFI, recursive callbacks, and mutable state

I'm using the FFI to access an API that requires us to supply a callback. However, many of the API functions I want to use inside the callback also trigger the callback (luckily, this is all single-threaded). While doing this, we also need to manage mutable state. This, however, means we risk violating the aliasing rules if, for example, both the top level and the recursive call create a mutable reference to the mutable state. I've included a simplified example below. How can I use this API from Rust without UB?

use std::sync::atomic::{AtomicPtr, Ordering};

static HANDLER: AtomicPtr<i32> = AtomicPtr::new(std::ptr::null_mut());

enum Op {
    Inc,
    IncRecursive,
}

fn f(x: &mut i32, op: Op) {
    match op {
        Op::Inc => *x += 1,
        Op::IncRecursive => {
            // Uncommenting this (which should be a no-op) makes it pass MIRI:
            // HANDLER.store(x, Ordering::Relaxed);
            g(Op::Inc);
        }
    }
}

// This would be the callback handler passed to the API.
fn g(op: Op) {
    let x = HANDLER.load(Ordering::Relaxed);
    f(unsafe { &mut *x }, op);
}

fn main() {
    let mut x = 0;
    HANDLER.store(&mut x, Ordering::Relaxed);
    // Simulate a callback invocation.
    g(Op::IncRecursive);
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.43s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: not granting access to tag <2855> because that would remove [Unique for <2880>] which is protected because it is an argument of call 836
  --> src/main.rs:23:16
   |
23 |     f(unsafe { &mut *x }, op);
   |                ^^^^^^^ not granting access to tag <2855> because that would remove [Unique for <2880>] which is protected because it is an argument of call 836
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <2855> was created by a SharedReadWrite retag at offsets [0x0..0x4]
  --> src/main.rs:28:19
   |
28 |     HANDLER.store(&mut x, Ordering::Relaxed);
   |                   ^^^^^^
help: <2880> is this argument
  --> src/main.rs:10:6
   |
10 | fn f(x: &mut i32, op: Op) {
   |      ^
   = note: BACKTRACE:
   = note: inside `g` at src/main.rs:23:16
note: inside `f` at src/main.rs:16:13
  --> src/main.rs:16:13
   |
16 |             g(Op::Inc);
   |             ^^^^^^^^^^
note: inside `g` at src/main.rs:23:5
  --> src/main.rs:23:5
   |
23 |     f(unsafe { &mut *x }, op);
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `main` at src/main.rs:29:5
  --> src/main.rs:29:5
   |
29 |     g(Op::IncRecursive);
   |     ^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

I would pass the mutable state down in the handler. e.g. change g to fn g(x: &mut i32, op: Op).

2 Likes

If the mutable state is opaque to the foreign code, you can use something with interior mutability instead of &mut ..., like &Cell<...> or &RefCell<...>

1 Like

g is the callback I'm passing to the API. I can't change its signature. (I've added some comments to the example to clarify this.)

The state is indeed opaque to the foreign code. But Cell won't work since the mutable state isn't Copy or Default, and RefCell will just cause the recursive call to panic. I'm unsure if using UnsafeCell buys anything here at all.

Only if you hold the lock when you make the recursive call. If you hold the lock only briefly when actually working with the guarded data directly, you should be fine.

2 Likes

That sounds very clunky. There are lots of calls happening to all sorts of functions, and this would require this unlock-lock pattern around each of them for what should really be a no-op.

The problem your original code is running into is that miri is concerned with enforcing permissions on pointers in a way that doesn't really perfectly line up with intuitions about how the actual hardware behaves. Your commented out code is sort of a no-op in terms of the actual runtime behavior of the hardware instructions since no other values are ever written to it, but it isn't a no-op in terms of miri's (experimental) memory model, Stacked Borrows.

When you perform that store back to the AtomicPtr you are implicitly converting it to a raw pointer which is "reborrowed" from the reference and storing it back in the AtomicPtr. That means the callback has permission to access the value again, because you've created a new (temporary) permission.

If you instead do

fn f(x: &mut i32, op: Op) {
    match op {
        Op::Inc => *x += 1,
        Op::IncRecursive => {
            // Uncommenting this (which should be a no-op) makes it pass MIRI:
            HANDLER.store(x, Ordering::Relaxed);
            *x = 12;
            g(Op::Inc);
        }
    }
}

It will fail miri again, because using that reference to mutate the value has to invalidate the raw pointer permission that the store pushed on to the borrow stack.

The root problem here is that an &mut strongly asserts that no other memory access can occur at that memory location, and since the reference is passed to the function it has to be valid the whole time the nested callback runs. If you use a raw pointer the problem goes away.

You can either hold a &mut to your state XOR call a function that invokes the callback again at any given time[1]. You can try to use the borrow checker to help enforce that requirement.

Playground

use std::sync::atomic::{AtomicPtr, Ordering};

static HANDLER: AtomicPtr<i32> = AtomicPtr::new(std::ptr::null_mut());

enum Op {
    Inc,
    IncRecursive,
}

struct State(*mut i32);

impl State {
    fn get(&mut self) -> &mut i32 {
        unsafe { &mut *self.0 }
    }

    fn f(&mut self, op: Op) {
        match op {
            Op::Inc => unsafe { *self.0 += 1 },
            Op::IncRecursive => {
                // Uncommenting this (which should be a no-op) makes it pass MIRI:
                // HANDLER.store(x, Ordering::Relaxed);
                g(Op::Inc);
            }
        }
    }
}

// This would be the callback handler passed to the API.
fn g(op: Op) {
    let x = HANDLER.load(Ordering::Relaxed);
    let mut state = State(x);
    state.f(op);
}

fn main() {
    let mut x = 0;
    HANDLER.store(&mut x, Ordering::Relaxed);
    // Simulate a callback invocation.
    g(Op::IncRecursive);
}

As long as you never construct a second copy of State in a single callback, the borrow checker won't let you hold a reference from get while you call f. It notably doesn't prevent you from calling get inside f in a way that would cause problems though. It might be a place to start though!


  1. assuming you don't do anything to create a temporary permission that the callback can access â†Šī¸Ž

2 Likes

Ah, I see. This is unfortunate, especially since it complicates interacting with the Windows message loop. I don't suppose there is any hope that this sort of access might become legal? That said, I came up with something almost identical to your approach. (It might even be somewhat ergonomic with Deref.)

mod perms {
    pub struct Permission<T>(*mut T);

    impl<T> Permission<T> {
        // Safety invariant: safe code must not have have mutable access to two of these pointing to the same T at the same time.
        pub unsafe fn new(x: *mut T) -> Self {
            Permission(x)
        }

        pub fn get(&mut self) -> &mut T {
            unsafe { &mut *self.0 }
        }
    }
}

use perms::Permission;
use std::sync::atomic::{AtomicPtr, Ordering};

enum Op {
    Inc,
    IncRecursive,
}

fn safe_callback(mut x: Permission<i32>, op: Op) {
    let y = x.get();
    match op {
        Op::Inc => *y += 1,
        Op::IncRecursive => {
            *y = 3;
            drop(y);
            safe_recursive_trigger(&mut x, Op::Inc);
            let y = x.get();
            println!("{}", *y);
            *y = 10;
            println!("{}", *y);
        }
    }
}

// We require a mutable reference to the permission to ensure there is no mutable reference to the shared state.
fn safe_recursive_trigger(_perm: &mut Permission<i32>, op: Op) {
    raw_callback(op);
}

static HANDLER: AtomicPtr<i32> = AtomicPtr::new(std::ptr::null_mut());

fn raw_callback(op: Op) {
    let x = HANDLER.load(Ordering::Relaxed);
    safe_callback(unsafe { Permission::new(x) }, op);
}

fn main() {
    let mut x = 0;
    HANDLER.store(&mut x, Ordering::Relaxed);
    raw_callback(Op::IncRecursive);
}

Based on my understanding, your original case isn't just a theoretical violation of Stacked Borrows but an actual case of UB in stable Rust. So I don't believe there's any chance the code as written would be valid in the future. The Stacked Borrows rules just provided a useful model for explaining exactly what's happening, particularly the part about the store causing your code to pass.

1 Like

There's two main ways to handle Windows message loops I've used, which are basically equivalent to most other callback systems:

  • Use thread_local!{ let mut STATE: Option<&mut State> = None; }, (or a mut static with careful unsafe access), so you are always reborrowing the real reference.
  • Use Box<State>::into_raw() on creating a window and stashing it in SetWindowLongPtr(GWLP_USERDATA), then only reading it via the window proc, including a drop(Box::from_raw()) in the WM_NCDESTROY. Remember to handle it not being set though!
1 Like

The problem here is not so much finding a good place to put a pointer to the state, but upholding Rust's aliasing rules when calling Win32 windowing API functions inside the window procedure that might themselves call the window procedure again. (In particular, since you can't legally have two mutable references to the state at once, the compiler is free to assume that the state in the initial call cannot change as a result of any FFI calls.)

Yes? Both those rules implement that. The first because you can only reborrow the mut ref which is fine, and the second because you are being careful to only access the state from the window proc. There is a risk with reentrant access in the second case, you can verify you're not hitting that by using RefCell.

The root of the issue that I want to be allowed to mutate the state freely, even if I am recursing into the window procedure through another API call, as if I had passed a mutable reference to the state into that call. Ideally, I would be able to fake that somehow or inform the compiler appropriately.

The first approach doesn't work as written since thread_local! only works with non-mut statics. It would need to be something like thread_local!{ static STATE: Cell<Option<&'static mut State>> = Cell::new(None); }. You mentioned that reborrowing the original reference is fine, but from what I understand, Stacked Borrows (and possibly current Rust?) say that it is not. That is, if y is a reborrow of x, then you cannot create another reborrow of x again until y expires.

The second one is basically what I'm currently doing. My problem with using RefCell is that you have to make sure to release the borrow before any call that might recurse into the window procedure, and then reacquire it if you want mutable access again.

Ok, if you're worried about reentrancy issues, then yes, you have a problem. But then you already would without FFI being involved at all: calling out to foreign code that can call back to you is already a Rust anti pattern for this reason, so you should be trying to think about it in those terms rather than it being about unsafe code soundness.

You have a few approaches that can work well, finer grained interior mutability is an easy one: simply hand out a &State instead of a &mut State and make it the problem of whatever needs to mutate to add the appropriate cell or whatever.

A nuclear option is to just use channels to remove any chance of reentrancy at all, but it means you effectively can't give a synchronous result. Still, this is completely safe and very easy to understand if it works.

An additional option is a trampoline: this can depend on the specific processing you're doing for a message, but you essentially use a thread local to store that you're inside a message for the first level, then during any reentrant calls instead notice that you're already doing handling and just push onto that the additional work that needs doing, then have the first level notice it has more work and do it then. The details are very specific to what you're doing though.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.