Closure `impl Fn` performance

zak · January 16, 2020, 11:05pm

I'm aware that function pointers with dynamic dispatch can reduce the performance of programs. Whilst the details of this are processor-specific, it seems generally agreed that these calls usually incur at least some extra cost.

Conversely, Rust's (relatively) new impl Fn(...)s seem to be implemented using static dispatch so I'm interested whether they incur the same cost (or indeed any as opposed to regular function calls)? As a simple but nonsense example, let's say I use a function that returns a closure...

fn foo(x: i32) -> impl Fn(i32) -> i32 {
    move |y: i32| {
        x + y
    }
}

fn main() {
    println!("{}", foo(3)(4));
}

...will I suffer worse performance than doing something like...

fn bar(x: i32, y: i32) -> i32 {
    x + y
}

fn main() {
    println!("{}", bar(3, 4));
}

...and if not, does this apply in the general case? Thanks!

OptimisticPeach · January 16, 2020, 11:19pm

impl Fn is actually the same as T: Fn() except it can be returned. So we can actually look at the general case of T: Fn() when looking at this kind of thing.

There are four types of non-mutating multiple-call function pointers/closures:

dyn Fn() you seem to have already discovered that this is the least efficient of them all, since there is the runtime cost of dynamic dispatch. This occurs because dyn hides everything about the underlying type, only exposing the appropriate vtable (the list of associated functions for that type).
fn() is a function pointer. This actually obscures the underlying function from the optimizer so you can't optimize away this function call sometimes.
fn() { <name> } is the type of the specific function named name. This allows you to do optimizations as if you'd just called the function directly, without any kind of function pointer nonsense.
Closures with state. These are tricky, so I'll point you to @RustyYato's great article about closures.

Since these all implement Fn, they all get optimized differently depending on the way it's used, meaning using impl Fn doesn't really change the way the optimizer behaves, since it really only observes these four cases.

jonh · January 17, 2020, 12:16am

god says no worse performance.

Not sure there is a general case. If you intend to call a closure multiple times it can be quicker; if the argument added in the single function case has to pass through validation.

birkenfeld · January 17, 2020, 5:28am

The constant-folded case is probably not so helpful to compare.

But with non-constant inputs it looks to be the same.

BTW, I had to invert the argument order in f, otherwise the compiler noticed that the functions are completely identical and optimized f out completely!

Yandros · January 17, 2020, 10:28am

becomes

/// `Fn(i32) -> i32` unsugared
trait MyFn {
    fn call (self: &'_ Self, _: i32) -> i32;
}

fn foo (x: i32) -> impl MyFn
{
    return Captured { x };
    // where
    struct Captured { x: i32 }
    impl MyFn for Captured {
        fn call (self: &'_ Self, y: i32) -> i32
        {
            self.x + y
        }
    }
}

fn main ()
{
    println!("{}", foo(3).call(4));
}

Except that the method is not "called" .call(...) but simply (...), since that's the sugar that the Fn traits provide.

See the Closures: Magic functions blog post for more info about this.

And now, the question is whether the code using MyFn and the code that computes the sum directly are "equivalent":

First, if we look at the foo() function, it is taking a i32 and returning a struct with a single field containing that input, and nothing more. So the foo() function is just a "cast"/ a no-op.

Similarly, if we look at the .call() function, it is actually just the .add() function for a i32
(since Self ≈ i32)

use ::core::{convert::identity, ops::Add};

fn main ()
{
    //                    foo         call
    //             vvvvvvvvvvvvvvv    vvv
    println!("{}", identity::<i32>(3).add(4));
    // i.e., after inlining no-op functions:
    println!("{}", 3 + 4);
}

So with a minimum amount of optimization (counterexample) the "no-op" functions can be inlined and thus skipped, making both codes equal.

zak · January 19, 2020, 10:42am

This is all really interesting, thank you for all the insight!

system · April 18, 2020, 10:42am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fn pointer vs closures, dynamic or static dispatch help	4	1604	January 12, 2023
[Q] On returning a closure help	2	406	August 15, 2019
Is there any reason to use `fn(t: &impl Trait)` and not `fn(t: &dyn Trait)`?	6	855	July 11, 2021
Removing dynamic dispatch that always makes the same choice help	7	356	June 28, 2023
Trade-offs of different ways to pass closure to functions help	5	367	June 23, 2021

Closure `impl Fn` performance

Related Topics