FFI: extern "Rust" fn pointer representation?

I'm looking for a way to call a Rust function from assembly, with one wrinkle: I've got 31 of these functions to call, and I was interested in trying to invent a single ABI shim rather than 31 of them. For context, the "brute force" way would look something like this:

unsafe extern "C" fn _start_trap1_rust(tf: &mut TrapFrame) {
    extern "Rust" {
        fn interrupt1(tf: &mut TrapFrame);
    }
    interrupt1(tf)
}

// ... repeat 30 times

Instead, I was hoping to use something like:

extern "C" fn _trap_rust_abi_shim(tf: *mut TrapFrame, f: fn(*mut TrapFrame)) {
    f(tf)
}

#[naked]
unsafe extern "C" fn _start_trap1() {
    extern "Rust" {
        fn interrupt1(tf: &mut TrapFrame);
    }
    // in reality this does a bunch of register spilling 
    // and sets up the trap frame at the top of the stack,
    // omitted here for clarity
    asm!(
        "
        // ...
        mv   a0, sp  
        la   a1, {irq}
        call {shim}
        ",
        shim = sym crate::_trap_rust_abi_shim,
        irq = sym interrupt1,
        options(noreturn)
    )
}

Why am I OK with having 30 repetitions of _start_trapN here? I have some half-baked ideas for how to deal with that later, but for now since it's #[naked] I don't believe I can call interrupt1 directly and I'd still need 31 _start_trapN_rust functions if I went that route. It's that latter question I'm looking to address.

This works, and compiles nicely to (almost) the desired assembly since the ABIs currently happen to match:

4200043c <esp_riscv_rt::_trap_rust_abi_shim>:
    f(tf)
4200043c:       8582                    jr      a1

But, rustc complains that I oughtn't be using an extern "Rust" fn(...) pointer in my extern "C" type signature:

warning: `extern` fn uses type `fn(*mut TrapFrame)`, which is not FFI-safe
 --> src/lib.rs:9:58
  |
9 | extern "C" fn _trap_rust_abi_shim(tf: *mut TrapFrame, f: fn(*mut TrapFrame)) {
  |                                                          ^^^^^^^^^^^^^^^^^^ not FFI-safe
  |
  = help: consider using an `extern fn(...) -> ...` function pointer instead
  = note: this function pointer has Rust-specific calling convention
  = note: `#[warn(improper_ctypes_definitions)]` on by default

Rust Playground

I've come up with two possible interpretations of that warning:

  • I am violating a core assumption that rustc will always be the source of all extern "Rust" function pointers and can therefore do whatever it likes with their representation. In other words, this works today because rustc happened to pick "usize address" as the representation for both extern "C" fn(...) and extern "Rust" fn(...), but expects to be able to change the latter to "4-bit aligned pointer with safety tags in the lower bits" or "objc_msgsend" (when we all switch to the smalltalk machine).
  • Or, the warning is spurious here: it's useful in an extern "<anything but Rust>" block because then I'd be declaring "someone, who may or may not be rustc, is going to invoke a function with the Rust ABI" which would indeed cause trouble when the representation of the parameters or return type changes.

I lean towards the latter, as the unsafe code guidelines currently seem to indicate at least this part of the Rust ABI has already stabilized and all function pointers themselves have the same representation regardless of the type (including ABI, parameters, return value, etc.). But, my curiosity was piqued: is this warning telling me I've got something new to learn here?

Also, bonus points if someone can point me towards how I could make the shim call site transparent to LLVM so that it can be inlined; I suspect this is impossible to pierce through the current various codegen layers, but it seems a shame that we'd have an extra whole indirection even when the Rust and C calling conventions are identical.

I wouldn't know about the stability of the function pointer repr, but have you considered doing the brute force approach but using a declarative macro to cut down on repetition?

I'm no expert on this topic, but it seems like the error message is telling you to try adding an extern in front of your param, i.e. to use a C-style function and not a Rust function.

extern "C" fn _trap_rust_abi_shim(tf: *mut TrapFrame, f: extern fn(*mut TrapFrame)) {

If you do this in the playground, it compiles successfully but I have no idea if it actually does what you want.

Thanks for the suggestion! It's a good idea, and one I'm already playing around with: I'm not using asm! directly in fact, but have a wrapper that adds the common prelude/epilogue. I should clarify that I don't see the downside of the brute force method as having to write out the code (it's only written once, after all) but that there'd then be 31 nearly identical shim functions in the final product. The cost for each one is small, about 6 bytes, but it's that cost I'm looking to optimize moreso than the mechanical input side.

Thanks for giving it a try! You're right that the compiler is suggesting that I switch the parameter type from "a rust function pointer" to something else, but I can't do that without changing the type of the callee as well (which in this example is in a different crate).

As I understand it, fn(..) is short for extern "Rust" fn(...), and extern fn(...) is short for extern "C" fn(...). So when you changed the signature to extern fn(...) you implicitly made the claim that interrupt uses the stable C ABI calling convention: but if its definition is fn interrupt(...) { ... } then any difference between the Rust ABI and the C ABI will cause Very Fun Debugging to occur.

My favorite example is u128: I'm not exactly sure what the difference is, but at least on 32-bit RISC-V calling a Rust function that takes a u128 from an extern "C" function that takes a u128 results in some register/stack shuffling. At a guess: in the C convention it's a single contiguous value on the stack, but in Rust it's effectively a pair of 64-bit integers stored with separate, distinct endianness. If you'd like, we could work through the details to make it precise, but the result is that for some u128s without having an ABI shim to do that shuffling their value will change when the extern "Rust" function is called as if it were an extern "C" function.

Interesting. My understanding is that Rust doesn't have a stable ABI because (among other things) it would preclude certain compiler optimizations, including the freedom to customize memory layout of datatypes on a per use-case basis.

Also, defining and popularizing an API is a lot of work, which hasn't been done yet, so to this day if you want a stable ABI for your Rust program, the only widely-used solution is to use the C ABI instead, with all the limitations that that entails.

Edit: Add a link to current progress on a Rust ABI:

oh my, you're saying that Rust's ABI varies on a function-by-function basis? extern "Rust" doesn't just mean "word-sized arguments go in registers a0,a1 etc. but we reserve the right to change it to a1,a2" but that specifically extern "Rust" { fn interrupt(...) } might expose an entirely different ABI than a different extern "Rust" function?

If so, then it sure seems like today I'm learning something new: I was assuming that extern "Rust" was parameterized only by function signature, like with extern "C". But if it's tied to function identity and each fn might have a different ABI, then when I'm telling rustc "hey, your arguments are in registers a0, a1 (via extern "C") but you're going to call the function at the other end of this pointer" it doesn't have enough information to know whether to e.g. swap a1 for a2 or not.

Well, at least without inspecting the value of the pointer and doing some kind of lookup on a runtime ABI mapping table, which I'm guessing doesn't exist. Though, if it doesn't, I'm left wondering how dynamic function pointers work at all: does the act of taking the address of a function "fix" the ABI, then? Or is there some other mechanism that rustc uses to ensure all the receivers have a cross-compatible shape?

there's more to "ffi safe" than merely the function pointer size. most notably, there are certain data types that are non ffi safe, if such data types are used as argument or return type by a function type, that function type is not ffi safe either.

also, there's the consideration of unwinding safety because the semantics might be difference across an ffi boundary.

nevertheless, for your use case, I think it should be safe, as the "ffi boundary" is actually within control of yourself.


that's my though too, llvm inlining happens at the IR level, the codegen backend isn't involed, the inline assembly is essentially a black box to the IR optimizer. I think the better bet you have is to get rid of the indirection of the shim, instead generate (e.g. using macros) an ffi shim for each rust functions

I think functions with the same signature have to have the same ABI, but the point is that function (pointers) with different signatures could have radically different calling conventions regardless of the surface level similarity of their signatures.

Well, based on the link in your original post, it does seem like function arguments won't get shuffled around like that:

The ABI and layout of (unsafe)? (extern "ABI")? fn(Args...) -> Ret is exactly that of the corresponding C type -- the lack of a null value does not change this. On common platforms, this means that *const () and fn(Args...) -> Ret have the same ABI and layout. This is, in fact, guaranteed by POSIX and Windows.

However, there's no guarantee in Rust that your individual function args have a stable representation, as you discovered with the u128 example. The Rust compiler is free to lay out variables as it sees fit.

The solution is typically to repr(C) every data structure that you pass over an FFI boundary, which forces the Rust compiler to use a C ABI-compatible layout for datatypes, at the cost of restricting compiler optimizations and not supporting some richer Rust concepts (e.g. closures that capture their environments).

Typically using the C ABI, or re-compiling from source and re-linking every time. This does have some advantages, as it enables global code optimizations and avoids some of the headaches of managing memory across the FFI boundary.

Edit: this discussion may also be pertinent to the u128 situation.