What's The Overhead of Calling Function Pointers Over FFI?

Is there any extra overhead when calling a function pointer compared to just calling a method on a reference?

fn main() {
    let s = String::from("hello");
    
    // Is this
    s.len();
    
    // Faster than this
    let len_func = String::from;
    len_func(&s);
}

I'm wondering because I'm considering a situation for a scripting solution where I pass function pointers over the FFI and I was wondering whether calling a function pointer over FFI was slower than calling an "extern C" function over FFI.

s.len() is sugar for str::len(&*s) which is "sugar" for (str::len)(&s) which is what let len_func = str::len; len_func(&*s); does.

Your question is rather to compare it to doing:

let len_func = str::len as fn(&'_ str) -> usize;
//                      ^^^^^^^^^^^^^^^^^^^^^^^
//               coerce compile-time constant to a function pointer
len_func(&s)

And even in that case, the compiler will notice that in this example the function pointer is a constant and optimize that back to our initial situation.

So the two things that are accurate to compare would be:

#[inline(never)]
fn foo (s: &'_ str) -> usize
{
    str::len(s)
}

// _vs._,

#[inline(never)]
fn bar (s: &'_ str, len_func: fn(&'_ str) -> usize) -> usize
{
    len_func(s)
}

And to compare calling foo(s) to calling bar(..., str::len as _):

example::foo:
  movq %rsi, %rax    # read-copy a `usize` field
  retq

example::bar:
  jmpq *%rdx

And basically, as you can see for example::bar, besides the tail-call optimization (jmp *code_addr instead of call *code_addr; ret), it is actually performing an "indirect jump": that is, there is a jmp instruction pointing to an address unknown at compile-time; which happens to be a bit slower than calling a function already know.

That being said, the whole "problem" with FFI is that neither language gets to see "the whole picture" (unless cross-language LTO is involved), so this kind of "a constant value is treated as a dynamic value, with the minor branch-related performance improvements that it incurs" problems are unavoidable.


Aside: extern "C"

In the case of FFI, the function pointer types involved are likely to be extern "C", whereas in my example they aren't (they are thus extern "Rust"), which just means that, there may be some differences w.r.t. the actual CPU registers used to pass values around (parameters, return value), and where a stack-memory cleanup ends up happening (caller or callee). It thus doesn't change the points raised in my post.

6 Likes

So essentially there's a little bit of overhead, but that overhead is essentially inherent when dealing with FFI boundaries anyway.

Thanks for the breakdown. :slight_smile: :+1:

3 Likes

w.r.t. that very point, know that if you are using dynamic linking (shared libraries) and have:

extern "C"{
    fn foo ();
}

it would be almost the same as having:

static foo: unsafe extern "C" fn() = linker_loader_magic(b"foo\0");
More precisely (for Linux dynamic linking with lazy loading)
static mut foo: unsafe extern "C" fn() = { // GOT entry.
    unsafe extern "C" fn find_foo () // <- PLT helper.
    {
        foo = dynamic_linking_loader_magic_find_symbol(b"foo\0");
        foo()
    }
    find_foo
};
  • or a more Rusty-idiomatic approximation:

    static foo: Lazy<unsafe extern "C" fn()> = Lazy::new(|| {
        dynamic_linking_loader_magic_find_symbol(b"foo\0")
    });
    

So whenever you call an extern-declared function in that scenario (shared library), you are already paying the cost of an indirect call.


In the case of static linkeage, this does not happen, since for once the "info with the location of the external symbol" is provided at "compile"-time (linking time), and can thus be hard-coded within the binary: the jump ceases to be "indirect" / virtual.

  • w.r.t. "virtual calls" (as these calls through function pointers are), know that it is a similar cost to calling a method on a dyn Trait object (well, to be precise, in the case of a dyn Trait method call, there in an indirect read (to the vtable contents) to then perform an indirect call (to the address provided by these contents).
1 Like

Awesome, that's exactly the info I was looking for!

1 Like

Actually the call directly jumps to the PLT entry. The first instruction of the PLT is a jump to a address at a GOT entry for the function. The first time it points to the next instruction, which calls your "dynamic_linking_loader_magic_find_symbol". This function locates the symbol, updates the GOT such that the next time the PLT directly jumps to the right function and then finally jumps to the right function. I recently investigated the exact implementation as part of trying to implement quick lazy compilation for cg_clif.

To a first approximation they should be identical. The optimiser has no knowledge about the function being called so it can't do any optimisation (e.g. inlining) to make the call faster so you are forced to incur the cost of a dynamic jump. It's just that in the function pointer case the exact function isn't known until runtime, while in the FFI case the function isn't known until link time (for static linking) or load time (for dynamic linking).

You may see a slight performance difference because a normal function pointer (e.g. fn(&str) -> usize) uses the "Rust" calling convention while FFI functions usually use the "C" calling convention. The calling convention specifies how arguments are passed and who's job it is to clean up the stack, so they'll generate slightly different machine code.

2 Likes

OK. That should be fine, then.

I'm going to be working on a scripting solution for Rust and I was trying to find out whether or not I should, for performance reasons, prefer to use static "extern C" functions for everything that would be called from the script, or if passing function pointers that the script could call would be reasonably as efficient. Sounds like I should be OK using pointers, which is good because I think that will simplify the implementation/make it possible at all with the design I'm thinking of. :slight_smile:

Yes, that is a usual question that arises when dealing with plugins: the idea is that the plugin will be providing some behavior, but often the plugin itself would like to internally use some behavior that the "host binary" can offer.

When linking, there is the --export-dynamic linker flag that can help with this (that is, when code inside the plugin calls dynamic_linking_loader_magic_find_symbol(...), the dynamic linking loader will be able to see the functions exported by the "host binary". But relying on such a specific linker flag can hinder portability, so the other approach is to have the plugin offer a registration facility to the "host binary":

fn init_host_binary_functions (
    log: extern "C" fn(&&str),
    sleep: extern "C" fn(usize),
    ...,
);

The host can expect the plugin to provide such as function:

extern "C" { /* Or manually using dlopen & friends */
    fn init_host_binary_functions (
        log: extern "C" fn(&&str),
        sleep: extern "C" fn(usize),
        ...,
    );
}

and the plugin can have a global struct to be overwritten with that:

struct HostFunctions {
    log: extern "C" fn(&&str),
    sleep: extern "C" fn(usize),
    ...,
}

static HOST_FUNCTIONS: OnceCell<HostFunctions> = OnceCell::new();

#[no_mangle] pub extern "C"
fn init_host_binary_functions (
    log: extern "C" fn(&&str),
    sleep: extern "C" fn(usize),
    ...,
)
{
    HOST_FUNCTIONS.set(HostFunctions { log, sleep, ... })
}

// so that it can now access such functions:

fn log (s: &'_ str)
{
    (HOST_FUNCTIONS.get().unwrap().log)(&s)
}

Yeah yeah, the hard-coded assembly implementation is hard to completely describe with the higher-level abstractions from programming languages, so I did mislabel by simplifying the things a bit :sweat_smile:

The closest I can describe that mechanism using higher-level language is that the find_foo is not a PLT entry but a PLT stub, and the real PLT entry is indeed wrapping the global:

unsafe extern "C" fn foo (...) -> _ // PLT
{
    static mut foo_ptr: unsafe extern "C" fn(...) -> _ = { // GOT
        unsafe extern "C" fn find_foo (...) -> _ // PLT stub
        {
            foo_ptr = dynamic_linking_loader_magic_find_symbol(b"foo\0");
            foo_ptr(...)
        }

        find_foo
    };
    foo_ptr(...)
}

EDIT

- log: fn (&'_ str)
+ log: extern "C" fn (&&str)

to make sure the passed function pointers have a stable ABI, so that the host binary and the plugin may be compiled using different versions of the Rust compiler.

1 Like

Yes, exactly along the lines of what I was thinking! Having that more concrete example will help me a lot.

Question, though, does the struct HostFunctions in the plugin need to be #[repr(C)] or do function pointer types like fn(usize) work fine over FFI if all the other types in the arguments and such are all #[repr(C)]?

If you use a "variadic" API (init_host_binary_functions takes many function pointers as parameters, may hinder readability if too many), then the HostFunctions struct does not cross the FFI boundary and so does not need a #[repr(C)]. But obviously it would be nicer to actually have init_host_binary_functions take an instance of HostFunctions, in which case it would have to be #[repr(C)] (and Copy because it doesn't hurt).

Good point regarding the function pointers: they would need to be extern "C" themselves, now that I think of it, I'll update that:

1 Like

If you are passing data across the FFI boundary then it must either be #[repr(C)]. In @Yandros's example the HostFunctions struct was just used on the plugin side as a place to store function pointers provided by the host.

If the host passed in a pointer to a HostFunctions type from its own memory, then it would need to mark HostFunctions as #[repr(C)].

Also, any function pointer passed between host and plugin must specify a calling convention. So you would end up storing log: unsafe extern "C" fn(&str). I've also marked it as unsafe because the plugin has no way of knowing what the host function actually does.

1 Like

Gotcha so this is a Rust function pointer:

type RustFnPointerExample = fn(usize) -> usize;

And this is a C compatible function pointer:

type CCompatFnPointerExample = extern "C" fn(usize) -> usize; // Unsafe optional but accurate
1 Like

@zicklag exactly. The keywords you are looking for here are Calling Convention and ABI.

Not having the extern "whatever" bit means it'll use the extern "Rust" calling convention, which is... whatever rustc feels like using at the time.

The Rust Reference has a section on Extern function qualifiers if you want to go down that rabbit hole.

1 Like