Choose FFI signature at runtime

I'm working on a Rust program for target aarch64-unknown-linux-gnu. Such program runs on different devices, and it detects at runtime on what specific device it's running.

The Rust program calls some Init routine from a C library some_c_lib that is available on the device. My issue is that Init's signature is device dependent, and thus bindgen (correctly) creates different FFI bindings.

On some devices there is:

// C header: unsigned int Init(int GpioId);
extern "C" {
    pub fn Init(GpioId: ::std::os::raw::c_int) -> ::std::os::raw::c_uint;
}

While on other devices there is:

// C header: unsigned int Init();
extern "C" {
    pub fn Init() -> ::std::os::raw::c_uint;
}

For now, I'm using conditional compilation to choose between the different function signatures, but that adds some burden with keeping track of which device needs which binary that I'd like to avoid.

Is there a way of figuring out choosing at runtime whether Init needs an argument or not?

In a similar situation, I've used an enum and a wrapper:

enum PlatformInit {
    Foo,
    Bar(c_int),
    Baz,
    Quux(c_int),
}

impl PlatformInit {
    fn init(self) -> c_uint {
        match self {
            Self::Foo => FooPlatform::Init(),
            Self::Bar(arg) => BarPlatform::Init(arg),
            Self::Baz => BazPlatform::Init(),
            Self::Quux(arg) => QuuxPlatform::Init(arg),
        }
    }
}

You construct PlatformInit with the right variant for this platform at runtime, and then can call PlatformInit::init() to start it running

3 Likes

That's nice, but it's not clear to me if it really solves the problem: FooPlatform::Init() and BarPlatform::Init(arg) both have to call the same extern "C" function, but with different signatures. How do I achieve that?


I still got inspired by your answer (thanks!), and came up with this code that seems to work.

// main.rs
use std::env;

fn init_void() {
    println!("init_void");
    #[link(name = "mine")]
    extern "C" {
        fn init();
    }
    unsafe { init(); }
}

fn init_int(n: i32) {
    println!("init_int");
    #[link(name = "mine")]
    extern "C" {
        fn init(n: ::std::os::raw::c_int);
    }
    unsafe { init(n); }
}

fn main() {
    let mut args = env::args();
    args.next();

    if let Some(arg) = args.next() {
        let n: i32 = arg.parse().unwrap();
        init_int(n);
    } else {
        init_void();
    }
}
//mine.c
#include <stdio.h>

#ifdef WITH_INT
void init(int n) {
	printf("With number! It was %d\n", n);
}
#else
void init() {
	printf("No number, then it's 42!\n");
}
#endif

Basically I declare the FFI function twice, once with and once without argument, as inner function inside two dedicated Rust functions.

This seems to work as expected:

$ gcc --shared mine.c -o libmine.so
$ LD_LIBRARY_PATH=. target/debug/choose_ffi
init_void
No number, then it's 42!
$ gcc --shared mine.c -o libmine.so -DWITH_INT
$ LD_LIBRARY_PATH=. target/debug/choose_ffi 24
init_int
With number! It was 24

However I'm not 100% convinced this is sound, and cargo/rustc does throw some warning at me:

warning: `init` redeclared with a different signature
  --> src/main.rs:16:9
   |
7  |         fn init();
   |         ---------- `init` previously declared here
...
16 |         fn init(n: ::std::os::raw::c_int);
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this signature doesn't match the previous declaration
   |
   = note: `#[warn(clashing_extern_declarations)]` on by default
   = note: expected `unsafe extern "C" fn()`
              found `unsafe extern "C" fn(i32)`

Any idea whether this code is sound or not? Can I just go ahead and silence the warning?

Perhaps it would be easier to just abuse the fact that it's safe, on aarch64-unknown-linux-gnu, to call zero-argument "C" function with one extra useless argument? And just always pass 0 for cases where that function accepts no arguments?

5 Likes

If you have control of the C code you could define your own wrapper function which has a consistent signature and uses the same #ifdef to select the right Init() signature.

That way instead of trying to switch between function signatures in your Rust code, the C provides an API which is actually portable.

6 Likes

That is interesting! How does it work exactly?

I've searched a bit and came across AArch64 - Procedure Call Standard. It says that if a function takes up to 8 arguments, then the registers X0-X7 will be used to pass such arguments (W0-W7 in my case as I'm passing u32). I had a look at the generated assembly (with debug profile), and indeed the two function calls look the same a part from a ldr into W0:

// inside init_void()
    76d0:       97fff4f4        bl      4aa0 <init@plt>
// ...
// inside init_int(n: i32)
    771c:       b94007e0        ldr     w0, [sp, #4]
    7720:       97fff4e0        bl      4aa0 <init@plt>

So always passing the argument even for the zero-argument case should just load 0 into W0, but then it would be ignored. Is that right? What bugs me is: can I be sure that the X0/W0 register won't be used for something else (especially with optimizations turned on), and writing 0 to it won't break some other operation?

"Fixing" the C code would indeed be the cleanest solution, but I'd say there's close-to-zero chances that the vendor will do that. The source code is pulled in from the vendor's repository during build of the device's root filesystem, so modifying the C code directly would be quite messy.

But it might be possible/easy to add a patch to be applied after pulling the code. Then I could add a new function InitGeneric(int) that always accepts an argument, and it will then call Init with the correct signature based on the #ifdef. I'll have a look at how it integrates with the workflow. Thanks for the tip!

1 Like

They could be used for anything, but not when you call another "C" function. Even if X0-X8 are not used for parameters passing the called function can destroy them (directly under table that you have found: The function foo() can use registers X0 to X15 without needing to preserve their values. However, if foo() wants to use X19 to X28 it must save them to stack first, and then restore from the stack before returning.)

Without needing to preserve their values means Rust can not rely on them being unchanged.

So, to start with, let's assume we have a way to write the following.

extern "C" {
    fn Init(FakeSignature) -> FakeRet;
}

In that case, you'd then be able to say:

type Signature1 = unsafe extern "C" fn(GpioId: c_int) -> c_uint;
type Signature2 = unsafe extern "C" fn() -> c_uint;

and then do:

if sig_1 {
    let init = unsafe {
        ::core::mem::transmute::<
             unsafe extern "C" fn(_) -> _, // FakeSignature,
             Signature1,
        >(Init)
    };
    let gpio_id = …;
    unsafe { init(gpio_id) }
} else {
    let init = unsafe {
        ::core::mem::transmute::<
             unsafe extern "C" fn(_) -> _, // FakeSignature,
             Signature2,
        >(Init)
    };
    unsafe { init() }
}

Now, the issue is a bit more subtle than that, since:

extern "C" {
    fn Init(FakeSignature) -> FakeRet;
}

cannot really be written: if you write it, then LLVM will make an assumption, before any usage, regarding the signature of that function, and thus transmuting it would technically be UB.

This is a typical situation of "we've been using Rust's higher-level pointers too much, and we should go back to just using raw pointers to avoid UB".

So let's use raw pointers, shall we? (Note: since we don't really have "raw function pointers", we'll have to use raw data pointers, and assume we're on a platform were those two have the same size)

extern {
    #[linkage = "external"] // <- with this `Init` becomes a pointer (an address)
                            // rather than a place (so we don't have to take its address afterwards).
    static Init: *const [u8; 0];
}

And then the code above is the same, but swapping the FakeSignature / the unsafe extern "C" fn(_) -> _ with *const [u8; 0], for instance:

if sig_1 {
    let init = unsafe {
        ::core::mem::transmute::<
-            unsafe extern "C" fn(_) -> _, // FakeSignature,
+            *const [u8; 0],
             Signature1,
        >(Init)
    };
    let gpio_id = …;
    unsafe { init(gpio_id) }
} else {
    let init = unsafe {
        ::core::mem::transmute::<
-            unsafe extern "C" fn(_) -> _, // FakeSignature,
+            *const [u8; 0],
             Signature2,
        >(Init)
    };
    unsafe { init() }
}

This would be the least UB way to achieve this, to my knowledge, but sadly it requires the unstable #[feature(linkage)]: Playground


The better way to do this

I think all the trouble here stems from mixing link-time with runtime: if you want to perform some of these if runtime branches, then you might be better off going full dynamic loading altogether, i.e., with no extern { … } blocks.

4 Likes

There are not “least UB” or “most UB” ways. You either have UB or doesn't have it.

Calling functions with extra arguments is perfectly supported thing in C (you hit one such function in the very first example of any C book where printf is called) and linker wouldn't care either (it doesn't know anything about functions or arguments, just about symbols) thus calling it is safe, too.

You solution with transmute is probably fine, too, but it's hard to say for sure if it's ok or not because it uses unstable features which are, well, not stable (thus don't have stable documentation either).

There's a big difference though. printf() is explicitly declared as variadic (so the compiler has to be prepared for calling them with a varying number of arguments), and indeed, variadic functions have their own separate ABI, which is very different from that of non-variadic functions. It is most definitely UB to call a non-variadic function through a pointer that doesn't match the signature with which it was defined.

Yes, but it's not an error to call it two or three (or dozen) of times differently as long as the actual call matches the proper signature (unknown to the compiler!).

I think you forgot that we are talking about "C", not "C++". There are, actually, not two possibilities (variadic signature, non-variadic signature), but three:

  1. Nonvariadic signature.
  2. Variadic signature.
  3. No signature at all.

And that's what we are struggling with. In C it's perfectly legal to do the following:

int foo();

if (bar) {
  foo();
} else {
  foo(1);
}

Of course one has to ensure that foo actually have one of these two signatures and bar correctly signifies which one is used, but the whole program is, bizzarely enough, fully supported and valid if everything matches.

Look for "6.5.2.2 Function calls", part 6 "If the expression that denotes the called function has a type that does not include a prototype".

P.S. That state, by itself, is a compromise: K&R C had no such limitations and open, e.g., was initially just a regular function with 2 parameters and then gained 3rd, optional, one. Later POSIX turned it into variadic function, but that curious loophole still remains there even in C18.

1 Like

Only a Sith deals in absolutes. Obi-Wan Star Wars quote

I'm semi-joking, here, but also semi-serious. Issues such as:

showcase that it is not yet completely clear what the formal semantics of Rust w.r.t. validity invariants involving extern "…" { … } declarations are, and this makes most of the affirmations here speculative. Such speculation can be made with a certain level of confidence and plausibility, which means we are definitely not in a binary / only-absolutes domain, and that we're rather in a more qualitative and pratical domain. It's disappointing, but that's the reality of a young language, and a messy compiler/linker/main-llvm-backend inherited state.

Hence why I've been describing the different approaches, and have avoided dealing in absolutes.


In this example, it may turn out that both signatures may happen to be compatible thanks to a variadic signature (fn Init(...) -> c_uint), so it would be a nice way to XY around the problem altogether. Truth be told, I don't know all the details involving variadic signatures and ABIs to make such a statement, so this may not even hold.

But even assuming such a variadic signature were able to XY the problem here, I think it's important to consider the general case mentioned in this thread's title:

Choose FFI signature at runtime

In the case of two fully incompatible function signatures (e.g., distinct return values), then I believe my remarks in the previous post to be quite legitimate: ideally —guaranteed UB-free—, one ought to use a fully dynamic approach, with no extern declarations whatsoever. But if somebody wanted to still use an approach involving extern { … } blocks, then they'd reduce the chances of their approach being declared UB if using raw pointers exclusively, rather than some arbitrary and potentially-incorrect function signature :slightly_smiling_face:

2 Likes

No. I am fully aware of the declaration rettype foo(); having a different meaning from that of rettype foo(void);. I am also fully aware of C and C++ being different languages.

Here it is in my copy of C99 (more precisely, the Committee Draft WG14/N1256 as of 7 Sept 2007), section 6.5.2.2.6:

  1. If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined.

From this, it follows that exactly one of the subsequent, mutually exclusive cases must hold:

  1. The function is defined as variadic and it is called when a correct, variadic prototype is in scope. This is defined behavior.
  2. The function is defined as variadic and it is called with an incorrect (non-matching) prototype in scope. This is explicitly UB by the excerpt above.
  3. The function is defined as variadic and it is called without a prototype. This is explicitly UB by the excerpt above.
  4. The function is defined as non-variadic, and it is called with a matching prototype. This is defined behavior.
  5. The function is defined as non-variadic, and it is called with an incorrect (non-matching) prototype. This is explicitly UB by the excerpt above.
  6. The function is defined as non-variadic, and it is called without a prototype, and the argument list matches the definition. This is defined behavior.
  7. The function is defined as non-variadic, and it is called without a prototype, but the argument list does not match the definition. This is explicitly UB by the excerpt above.

Now, let's see which of the non-UB cases allow one to call a function with two argument lists of different types where one of them is empty:

  • #1. No. A variadic signature must include at least one declared parameter, and the implementation of a variadic function would have to realistically use the last non-variadic argument (for va_start() etc.) in order to gather information about the rest of the arguments anyway, unless it always ignores all of its arguments (in which case OP's question wouldn't have been a problem in the first place).
  • #4. No. A prototyped function can't have several different signatures.
  • #6. No. For the same reason: a single function definition can't have several different signatures at the same time, so one of the inferred signatures necessarily won't match. I.e., only one of the calls will actually qualify as case #6, while the other one will necessarily become case #7, which is UB.

From this, it is concluded that your proposed solution is UB.

1 Like

Thank you all for the different suggestions, this is turning out to be quite a learning experience!

Just as a clarification (probably not needed), the C function I need to call is not variadic, it has the same return type in both cases, and in one of the cases it has no arguments specified (not even void):

/* some.h */
#if defined DEVICE_SPECIFIC_FLAG
unsigned int Init();
#else
unsigned int Init(int GpioId);
#endif

/* some.c */
#if defined DEVICE_SPECIFIC_FLAG
unsigned int Init()
{
#else
unsigned int Init(int GpioId)
{
    /* ... */
    return 0;
}

Here's some thoughts and questions on the different approaches.


Dynamic loading

With help from the libloading crate, this approach is actually quite simple to implement. I don't know much about dynamic loading and whether it has any impact on performance. But in my case the usage of the dynamically-loaded library will be confined in a non-performance-sensitive part of the application.

So I might very well end up using this approach!


AArch64 procedure call standard

As I wrote, I find this approach very interesting.

I often find myself drawn to both very low-level solutions where you get the feeling you are telling a piece of silicon what to do, and to very high-level solutions where you can see the magic power of abstraction. This would end up in the first category.

The advantage is that the Rust code doesn't really need to do anything smart: always call the function with an argument, and the argument will be ignored if the function doesn't expect it. In my case, the u32 I have to pass in some cases is actually a constant, so basically all runtime checks would be gone. And looking at the AArch64 reference, it does seem like it should work. I've also tested this and it seems to work fine, but my simple test is no guarantee of soundness.

On the other hand, I find the concerns raised by @H2CO3 quite convincing. The architecture might support calling a zero-argument function with one argument, but if the C (or Rust) language considers that UB then AFAIK the compiler and/or linker (perhaps during LTO?) might end up doing optimizations that would break my code.


Raw pointers

The linkage feature seems to be perma-unstable, and I'd like to stay on the stable toolchain if possible. On the other hand, I don't need the linkage feature if I know the exact signature (which I do know) as @Yandros' playground link shows. Or am I missing something?


clashing_extern_declarations

I'm still confused on whether this is allowed (despite the warning by rustc) or not, and how it differs from the raw pointer approach with respect to UB.

In Rust, I would create two extern signatures one of which would match the signature in the available library and the other would not. If I make sure to only use the matching signature while calling, that should fall in case 4 and be UB-free, right?

Or is just creating a wrong signature in Rust (without calling) source of UB? I guess this is a corollary of my question on whether the warning about clashing_extern_declarations can be safely silenced.

And that is where you are wrong. I single function can't have several different signatures at the same time in a single program. But compilation of single C translation unit couldn't rely on that.

Compiler must process any valid C translation unit in a way that it can be combined with any other C translation unit to form the valid program.

Consider the following two programs with three translation units:

foo1.c:
int selector = 0;

int foo() {
  return 42;
}
foo2.c
int selector = 1;

int foo(int x) {
  return x;
}
bar.c
#include <stdio.h>

extern int selector;
extern int foo();

int main() {
  if (selector) {
    printf("%d\n", foo(57));
  } else {
    printf("%d\n", foo());
  }
}

Consider first program: foo1.c and bar.c. Does it have UB? No: only second call to foo() is executed and it's correct. First call is never executed thus the fact that it doesn't match the definition of foo() in foo1.c is irrelevant.

Consider first program: foo2.c and bar.c. Does it have UB? No: only first call to foo() is executed and it's correct. Second call is never executed thus the fact that it doesn't match the definition of foo() in foo2.c is irrelevant.

Now, let's consider bar.c in isolation (it's C translation unit, remember? it's processed by C compiler in isolation, rememeber?). Compiler must compile it correctly and produce an object file suitable for both foo1+bar program and foo2+bar program. Because both are correct.

UB is not static property. UB is triggered when certain “bad” things happen during executing. And nothing “bad” happens during executing of that program. Ergo: “no UB”.

P.S. Note that selector doesn't need to reside in the 2nd object file. It may even be asked from user! As long as user does correct choice every time program works correctly and doesn't trigger UB every time.

You are bringing translation units into this, but I don't see how that is relevant. What your actual, stronger claim is that the compiler is not allowed to assume a single signature even within the same TU (if the declaration lacks a prototype). I don't see any trace of that claim in 6.5.2.2. Where'd you get it from? The wording of the standard seems pretty static-centric. It implies that there is such a thing as the signature, and it doesn't say that it is allowed to change dynamically.


In any case: I don't even care whether it is defined or not. It's bad practice. It's horrible, terrible, unacceptable, do-not-ever-do-this kind of bad practice. I think the alternatives are pretty clear, easy to implement, and much, much harder to abuse or get wrong by accident. I don't think we should argue for the "mixed-prototype" approach, even if at the end of this debate it turns out to be technically "correct".

From the initial purpose of all that hoopla. It was needed to handle different version of Unix which had similar situations to what topicstarter encountered: some had prototypes of certain functions from BSD, some from System V and they were different.

Naturally people wanted to write a single code for both thus compiler writers had to adjust.

Compiler can assume anything it wants. But the rules are simple: foo1+bar is valid, foo2+bar is valid, please complie bar it in a way that ensures that, thanks…

If code is not executed it couldn't produce UB and if it's executed according to the standard then everything should work.

Indeed. What the whole industry uses is much simpler approach which I recommended before: just pass parameter which would be ignored.

That's what Android SDK recommends e.g.:

Starting with API level 30, ifunc resolvers on arm64 are passed two arguments… code that wishes to be compatible with prior API levels should not accept any arguments in the resolver.

It's not about some ancient library from ancient times. API level 30 is Android 11, supporting Android 10, released in 2019 doesn't sound like something you absolutely don't want to do. And they explicitly recommend you to declare your function as having no arguments despite the fact that would be called with arguments.

Extra arguments are just ignored and that's how bionic, glibc, gcc, clang and other related tools are designed to behave.

What standard does or doesn't say about that practice is not very relevant: when there are are two solutions, one used on billions of devices by billions of people and the other, recommended by some technical manuals, but not used in real world — I tend to stick with the solution that is tested and works.

And that mindset is exactly why I think we live in the age of CVEs, sloppy C code, and misinformed university classes/students. Programming against gut instincts instead of the standard of the language, and saying "it doesn't crash on my machine so it must be correct" is extremely naïve, and has been proven to be harmful and detrimental to software quality as well, according to the industrial experience of the past couple of decades.

In this case, others have proposed a clearly, obviously right solution (that doesn't even necessitate scrutinizing the standard): fully dynamic loading. It's much easier to get right, and doesn't require assumptions of a particular platform/ABI, which will be forgotten about quickly by the time someone needs to port it over to a different platform.

1 Like

Programming against standards and saying that it's not your fault that program doesn't work is even more naïve.

Compilers have bugs and using some obscure features of any standard which none, even the compiler writers, have any idea about, tend to be less robust approach than use something which other well-known projects are using even said approach doesn't conform to the standard 100%.

Why is that?

Both solution go outside of scope of the standard (you can not cast void * to function to pointer in a standard C). Only one is “good” (it doesn't conform to the standard but used often thus is “safe”) while the other “bad” (it doesn't conform to the standard and the fact that it's used very often is, somehow, irrelevant).

Actually it does.

I agree that using dlopen is safe, but it's safety guaranteed by the very same thing: it's used on billions of devices by billions of people, ergo it's safe.

But C standard doesn't support dynamic loading and doesn't even support conversion between void* and function pointer which means you couldn't, actually, use dlopen in fully-compliant C code.