Conversely, Rust's (relatively) new impl Fn(...)s seem to be implemented using static dispatch so I'm interested whether they incur the same cost (or indeed any as opposed to regular function calls)? As a simple but nonsense example, let's say I use a function that returns a closure...
impl Fn is actually the same as T: Fn() except it can be returned. So we can actually look at the general case of T: Fn() when looking at this kind of thing.
There are four types of non-mutating multiple-call function pointers/closures:
dyn Fn() you seem to have already discovered that this is the least efficient of them all, since there is the runtime cost of dynamic dispatch. This occurs because dyn hides everything about the underlying type, only exposing the appropriate vtable (the list of associated functions for that type).
fn() is a function pointer. This actually obscures the underlying function from the optimizer so you can't optimize away this function call sometimes.
fn() { <name> } is the type of the specific function named name. This allows you to do optimizations as if you'd just called the function directly, without any kind of function pointer nonsense.
Since these all implement Fn, they all get optimized differently depending on the way it's used, meaning using impl Fn doesn't really change the way the optimizer behaves, since it really only observes these four cases.
Not sure there is a general case. If you intend to call a closure multiple times it can be quicker; if the argument added in the single function case has to pass through validation.
BTW, I had to invert the argument order in f, otherwise the compiler noticed that the functions are completely identical and optimized f out completely!
And now, the question is whether the code using MyFn and the code that computes the sum directly are "equivalent":
First, if we look at the foo() function, it is taking a i32 and returning a struct with a single field containing that input, and nothing more. So the foo() function is just a "cast"/ a no-op.
Similarly, if we look at the .call() function, it is actually just the .add() function for a i32
(since Self ≈ i32)
use ::core::{convert::identity, ops::Add};
fn main ()
{
// foo call
// vvvvvvvvvvvvvvv vvv
println!("{}", identity::<i32>(3).add(4));
// i.e., after inlining no-op functions:
println!("{}", 3 + 4);
}
So with a minimum amount of optimization (counterexample) the "no-op" functions can be inlined and thus skipped, making both codes equal.