Fastest way to call function object?


#1

I’m working on some code that calls a function which is either defined as a function associated with a trait or is provided as a callback by the user. The code paths that call this function are very cheap, so the function call overhead is significant (in fact, when I switched from a trait-only implementation (in which the functions are known at compile time) to a Box-based implementation, my time/op increased by ~10%). My question is: is there a mechanism for doing this that is faster than using Box? Unsafe code is acceptable here.

Here’s a toy version of what I’m doing:

struct Foo<'a, T> {
    init: Box<Fn() -> T + 'a>,
}

impl<'a, T> Foo<'a, T> {
    fn new<F: Fn() -> T + 'a>(init: F) -> Self {
        Foo { init: Box::new(init) }
    }

    fn init(&self) -> T {
        self.init()
    }
}

impl<'a, T: Default + 'a> Foo<'a, T> {
    fn new_initializable() -> Self {
        Foo { init: Box::new(T::default) }
    }
}

#2

A trait object consists of a pointer to the object and a pointer to the vtable; calling a method of one loads a function pointer from the vtable and calls that. If you store the function pointer directly instead, you can save the load. You can use a wrapper function like

fn invoke<F: Fn() -> blah>(f: *const u8) -> blah {
  let f = unsafe { f as &F };
  f() // this call will be inlined
}

…then store the function pointer invoke::<F> as fn(*const u8) -> blah, along with the object pointer cast to *const u8.

Alternately, if the user callback doesn’t need context/a self argument and can be a plain function pointer, just do that.

Warning: Any speedup from this approach is likely to be very small (it’s probably a better idea to refactor the code, but that’s not what you asked). Also, be mindful of destructors.


#3

OK that’s awesome, thanks! Will this also work for closures?


#4

Are you sure it’s the vtable load that’s contributing to the difference? My guess would be that it’s the lack of inlining, and calling through a (direct) pointer won’t help that.


#5

Yeah, looks like you’re right. The big gains come from inlining. My guess is, for example, that the compiler can often elide copies. E.g., if T::default just returns a constant, then something like this:

std::ptr::write(ptr, T::default());

can be trivially optimized to elide copying the return value of T::default and just writing it directly into ptr.


#6

Yes, definitely. Inlining will expand optimization horizon, and may cause significant simplification if constants fall out. There’s also a lot more value tracking available once inlined, which can simplify control flow/remove dead branches, code motion/scheduling opens up, and so on (not going into all inlining details here). Suffice to say, there’s a reason inlining is referred to as the Mother of all Optimizations :slight_smile:.