Hi,
I am new to Rust. I have a question about accepting closure ownership by a function. My understanding is that there are two ways we may go about doing this. First, by using generics. Second, by using Box with dynamic dispatch. When looking at std::thread, I notice that thread::spawn uses generics. Does this mean that the compiler will emit a new thread::spawn code for every closure we pass to it, and thus we end up with many thread::spawn code? Why was the generic approach chosen over the dynamic dispatch one?
Yes, monomorphization will generate new thread::spawn functions for each and every closure passed in. The reason is because of Rust's commitment to zero-cost abstractions. Using dynamic dispatch would not be zero-cost because there is the cost of doing the dynamic dispatch. The cost of generics is maybe larger binaries, but in return you get huge speedups due to all of the optimizations that can be done because of it, and there is no runtime cost*.
You will see generics used a lot in Rust, for some extreme examples, just look to typenum and frunk
* Technically there is a runtime cost of the larger binary, which can cause performance regressions due to bad cache behavior, but this concern is so niche that it is not worth considering unless you are trying to get every 0.001% performance out of your app.
Note that generic approach does not always increase the binary size. Dynamically dispatched functions are always* located in program binary as we need to take its pointer. But for statically dispatched [generic] functions, especially for small ones, are usually inlined and merged into the caller's code, and produce even smaller(and faster) code as a result.
*: Let's ignore devirtualization for now, as it's far from common in Rust world.
If you look at the implementation of thread::spawn, you will see that it eventually creates a Box before performing the OS-specific code for creating threads.
In general it's easy to write functions that are generic over closures for convenience, but that use dynamic polymorphism internally in order to help tame codegen.
pub fn do_a_thing<F: FnOnce()>(f: F) {
// just a trampoline
_do_a_thing(Box::new(f))
}
fn _do_a_thing(f: Box<dyn FnOnce()>) {
/* actual implementation */
}
It is an interesting pattern actually. Thanks for bringing that up.
If I am right, do_a_thing (generic) will be optimized away and we are essentially calling _do_a_thing (dynamic), and it avoids the syntax ugliness of having to call Box::new.
You are correct. Because _do_a_thing() is monomorphic (i.e. not generic), the code for it is generated when the crate it belongs to is built, and without #[inline] it will never be inlined across crates. Meanwhile, do_a_thing() will likely be generated anew at each call site, but it is cheap and almost certain to be inlined.
And this technique isn't just limited to dynamic polymorphism; you can do something similar with conversion traits:
Regarding the hidden Boxing of the closure, when you know you need a boxed input, taking an unboxed parameter to improve call-site ergonomics is a bad idea: imagine the caller already had a boxed closure, now thread::spawn will end up boxing an already boxed closure.
(the reason ::std does it may be related to API stability, imagining that at some point the boxing could be prevented)
Nevertheless, I could imagine using something like:
Maybe? But more likely it just doesn't matter. I mean, I'm all for zero-cost abstractions when they make a difference, but if you're spawning an OS thread, you have much bigger things to worry about than a pointer allocation and an extra layer of indirection.
An Arc must be created to send back the return value.
A mutex must be locked to obtain a unique id for the thread.
It has to... you know... create a thread. God help the user if they're on Windows.
There is zero hope for any sort of compiler optimization across the thread boundary. (or is there?)
Aside: Mind that, as recently as four weeks ago, boxed closures didn't even impl the Fn traits!