I'm designing an API, with a 'map' function on a collection.
I'm considering:
(A) fn map2<U, F: Fn(&T) -> U>(&self, f: F)
vs
(B) fn map<U>(&self, f: Rc<dyn Fn(&T) -> U>)
My understanding of (A) vs (B) is that:
for (A), a new version of the function is constructed for every every the source code uses .map, thus possibly increasing compile time; for a given (T, U) pair, many pieces of code may be generated if lots are different functions are passed
for (B), for each (T, U) pair, only one version of the function is constructed, but there might be an extra indirection during runtime
===
To me, it seems that (B) is superior to (A), but the existing iterator design seems to use (A).
What are the main trade offs of (A) vs (B) .. and why does the std iterator seem to prefer (A) ?
It's true that this means you get more instances of map2 in your binary. But the benefit of this is that each instance can be optimized individually for the function being used!
This means that rather than having to use a function call per item, .map() can almost always inline the passed in function - it'll be as if you've written a custom map function performing the operation you pass in. Furthermore, since there's one per Iterator type as well, you can get full chains of operations optimizing together.
This lets Rust have things like x.iter().map(x).map(y).filter(z).collect() be as fast, or faster than an equivalent manual loop, like
for item in x {
let item = x(y(item));
if !z(item) { continue; }
result.push(item);
}
This is one of the biggest strengths of iterators in rust - that they can be compiled down to almost nothing.
When you use A, you do technically create a new version of the function for this particular argument. But if that function is inlined, and the argument is inlined, the benefit can outweight the disadvantage.
With that said, the main reason I'm personally OK with this is that map, filter, chain, etc. are all super small functions. The function bodies are small enough for them to almost always be inlined, and furthermore, for those extra instructions to not really matter for overall binary size. map is quite literally just constructing a Map struct with the argument and this body. It's very, very little cost to copy the function once per use. If it's inlined, then it could and should completely disappear, as in the example above.
But if you have to make this choice for an API with much bigger function implementations, then the balance could definitely be different. The reason the iterator API works with this is that A) all the functions are tiny, and B) iterators are called so often that the performance gains are worth sacrificing a little binary size (or none, if lucky). If that doesn't match with the API you're designing, then another choice might be better. If your functions aren't called often, or if they have huge (40+ line) implementations, then option B will be better.
This could be more elegantly resolved if Rc<dyn Fn(&T) -> U> implemented Fn(&T) -> U. In which case (A) could be used in exactly the same way as (B), while also enabling the flexibility and run-time performance benefits of monomorphization where needed.
I'm actually surprised that this does not seem to be the case. Did I get something wrong in that code, or is an std feature request in order?
Is this partly why in Rust, functions/closures that have the same types for Inputs and same type for Output are actually considered to be different types? So that we can inline them and optimize them away?
If your functions are true functions (not closures, which can also store captured variables), then you can coerce them to fn(...) -> ..., which is a function pointer, which has one concrete type for all functions with the same input/output. For example, fn(i32) -> i32.
But as long as you're using one of the traits (Fn, FnMut, FnOnce), you'll get a unique type which is statically-known, and functions using it can optimized with that knowledge (including inlining the function).