What's the name of this technique for cutting down compile times from monomorphization?

Armin Ronacher gave a talk recently where he describes a technique for speeding up compile times of generic functions by moving the function boundary of the generic parts away from the concrete business logic of a function.

His discussion of this technique can be viewed at the 21:40 mark of the recording of the talk.

To summarize here briefly (an any errors in my explanation are mine, not Armin's), you have some function that you want to write so that it can take anything that implements a particular trait:

fn very_big_function<S: ToString>(some_data: S) {
    let s = some_data.to_string();
    // hundreds of lines of business logic

In your application, you call very_big_function on a hundred different concrete types that implement ToString. For every concrete type that the function is called, the compiler will potentially generate a complete copy of it with just the type-specific parts for that concrete type changed. Since very_big_function is very big, this generates a lot of machine code. Generating all that code takes time, and also results in a larger binary in the application that's finally compiled.

The technique he describes involves pushing the business logic of very_big_function downwards and making it only take a single concrete type, while pushing the generic part upwards and keeping it's implementation as minimal as possible.

// generic function only calls to_string, and the business-logic function
fn generic_part<S: ToString>(some_data: S) {

// business logic function now only takes strings
fn very_big_function(s: String) {
    // hundreds of lines of business logic

Now there will be only one copy of very_big_function, and each function per your hundred concrete types are limited to converting the concrete type to a string and calling the very_big_function. Playing with this locally, I can confirm that the resulting binaries are smaller when the concrete and generic parts are separated like this.

What is the name of this technique?


Non-generic inner functions? See e.g. on Possible Rust.


I think when this is done automatically, it's called polymorphization.


This technique is heavily used by std itself, by the way. (Usually the auxiliary non-generic fn is written nested inside the generic one.) One day, the compiler could be smart enough to do this on its own; techniques to automatically deduplicate generic code go under the umbrella term “polymorphization”.


In my mind it's the momo, which is the name of a crate which provides a proc macro that splits your functions like this for you.


I'm curios to know why they (and you) believe "immediately blow up your compile times" is the result.

I don't know that it has a common name. I learned it a long time ago for C++, but never with a specific appellation. Here's an example of using it in std:

Looks like rustc has something it calls "polymorphization", though

Here's a simple test case just to show how easy it is to observe the difference


use seq_macro::seq; // seq-macro = "0.3.2"
use std::fmt::Debug;

fn action<T: Debug>(value: T) {
    #[cfg(feature = "polymorphic")]
    #[cfg(not(feature = "polymorphic"))]
    seq!(x in 0..1_000 {
        println!("{}: {value:?}", x);

#[cfg(feature = "polymorphic")]
fn action_inner(value: &dyn Debug) {
    seq!(x in 0..1_000 {
        println!("{}: {value:?}", x);

fn usize() {

fn str() {

fn string() {

fn vec_usize() {
    action(vec![1usize, 2, 3]);

fn main() {

cargo clean && cargo build reports build times of ~2.7s
cargo clean && cargo build --features polymorphic reports build times of ~1.7s


Sweet! Thank you!

Old Windows 7 computer... 4.98s vs 2.83s.

4.19s vs 2.79s for AsRef<str>.

Just wanted to say, thank you to everyone who contributed to this post! I am surprised and delighted how quickly this idea went from something vague I've heard about to something so clearly and concisely demonstrated about the language with additional followup references. :heart:


Isn't this testcase simething different?
You are basically exchaning monomorphisation with dynamic dispatch. You could easily just do that on the top level. The example shown in the talk is focused on functions which have a small generic part in the beginning and the go on to do non generic stuff which is needlessly duplicated.

I was just demonstrating that monomorphization has an observable effect on compile times

1 Like

Ok :+1:

After a few quick tests, it seems indeed that it's not done automatically (yet) but when the non-generic part is embedded into an inner function, the outer function can still be made #[inline] without inlining the inner function which remains separate and unique, no need to add any #[inline(never)] attribute.

That may be interesting if the generic part is very short, as in the example above. Otherwise it means an extra call and potentially missed optimization opportunities, though the impact shouldn't be significant in comparison to a "very_big_function" (in my case the prologue was quite short, just increasing the stack pointer and pushing a couple of registers).

Do folks know if C++ template functions can benefit from the same construct / does C++ compiler have a flag we can toggle? Asking as it was also briefly mentioned by @scottmcm in the same thread.

I code in both C++ and Rust on a regular basis so am curious.


It's common for std::vector to have the normal implementation for void*, say, but then to specialize it for all other pointer types to work by reinterpret_casting to and from void*. That way all pointer types can end up using the same "grow it" logic and such.


I've both heard and referred to this technique as "trampolining" a function.

It's also used for scenarios other than optimizing monomorphization.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.