Fastest way to call function object?

I'm working on some code that calls a function which is either defined as a function associated with a trait or is provided as a callback by the user. The code paths that call this function are very cheap, so the function call overhead is significant (in fact, when I switched from a trait-only implementation (in which the functions are known at compile time) to a Box-based implementation, my time/op increased by ~10%). My question is: is there a mechanism for doing this that is faster than using Box? Unsafe code is acceptable here.

Here's a toy version of what I'm doing:

struct Foo<'a, T> {
    init: Box<Fn() -> T + 'a>,
}

impl<'a, T> Foo<'a, T> {
    fn new<F: Fn() -> T + 'a>(init: F) -> Self {
        Foo { init: Box::new(init) }
    }

    fn init(&self) -> T {
        self.init()
    }
}

impl<'a, T: Default + 'a> Foo<'a, T> {
    fn new_initializable() -> Self {
        Foo { init: Box::new(T::default) }
    }
}

A trait object consists of a pointer to the object and a pointer to the vtable; calling a method of one loads a function pointer from the vtable and calls that. If you store the function pointer directly instead, you can save the load. You can use a wrapper function like

fn invoke<F: Fn() -> blah>(f: *const u8) -> blah {
  let f = unsafe { f as &F };
  f() // this call will be inlined
}

...then store the function pointer invoke::<F> as fn(*const u8) -> blah, along with the object pointer cast to *const u8.

Alternately, if the user callback doesn't need context/a self argument and can be a plain function pointer, just do that.

Warning: Any speedup from this approach is likely to be very small (it's probably a better idea to refactor the code, but that's not what you asked). Also, be mindful of destructors.

2 Likes

OK that's awesome, thanks! Will this also work for closures?

Are you sure it's the vtable load that's contributing to the difference? My guess would be that it's the lack of inlining, and calling through a (direct) pointer won't help that.

Yeah, looks like you're right. The big gains come from inlining. My guess is, for example, that the compiler can often elide copies. E.g., if T::default just returns a constant, then something like this:

std::ptr::write(ptr, T::default());

can be trivially optimized to elide copying the return value of T::default and just writing it directly into ptr.

Yes, definitely. Inlining will expand optimization horizon, and may cause significant simplification if constants fall out. There's also a lot more value tracking available once inlined, which can simplify control flow/remove dead branches, code motion/scheduling opens up, and so on (not going into all inlining details here). Suffice to say, there's a reason inlining is referred to as the Mother of all Optimizations :slight_smile:.

1 Like