Minimizing compile times via concrete trait implementations

Suppose I have a trait implementation for a generic structure I want to export from my crate:

pub trait MyTrait<T> 
    T : Copy

    fn concrete_func(&self)  -> i32 {

    fn generic_func(&self, arg : T) -> T {

pub struct MyType<T> {
    field : T

impl<T> MyTrait<T> for MyType<T> { }

Does the compilation of concrete_func to native code happens when the user specializes the type in its crate, or does it happen just once when the dependency is compiled?

What about generic_func? Can I, in the interest of reducing compile times for people consuming my crate, export a bunch of concrete implementations instead of the generic implementation, to get the generic function to be compiled only once, when the dependency is compiled?

impl MyTrait<i32> for MyType<i32> { }
impl MyTrait<i64> for MyType<i64> { }
impl MyTrait<i128> for MyType<i128> { }

If this is the case, can the use of concrete implementations for generic traits and types at upstream crates minimize compile time for users, when I already know (as the crate author) all possible type arguments the type could have? Is it good practice?

1 Like

I think the direct answer to this question is that it doesn't matter because the orphan rules make it so the only person able to write impl MyTrait<i32> for MyType<i32> is the author of MyType (or MyTrait, whichever crate is downstream of the other if the two are in different crates).

What you are probably thinking of is something slightly different. For example, let's look at the following code:

// crate A
fn generic_function<T: SomeTrait>(arg: T) { ... }

// crate B
fn main() {
  crate_a::generic_function("Hello, World!");

// crate C
fn main() {
  crate_a::generic_function("Hello, World!");

The way monomorphisation works[1] is that every time a crate[2] is compiled, it will generate machine code for a copy of the generic function for every unique combination of types that crate actually uses (e.g. in crate B one copy might be the crate_a::generic_function::<i32>, another is crate_a::generic_function::<&'static str>, and so on).

That means we won't reuse copies across crates/compilation units and adding instantiations to the upstream crate will only increase the compilation times for itself with no benefit downstream.

  1. ... or at least my understanding of it. Feel free to correct me and point to docs/source code if I'm wrong! ↩︎

  2. technically a "compilation unit". Sometimes rustc decides to split your crate into multiple pieces so it can compile each piece in parallel at the cost of less optimisation opportunities. ↩︎

So monomorphization happens only at the call site, not at implementation blocks? I thought otherwise, since the implementation block already contains all the information the compiler needs for specialization.

Very good question! It's hard to know what happens with traits impls, alone, but the main thing is that usually the most pessimistic hypothesis about these things is alas the right one, currently: a given choice of parameters may very well end up monomorphized multiple times across codegen-units / crates! (as @Michael-F-Bryan already pointed out).

  • This can often be noticed with the pervasive serde crate, which is highly generic. I suspect there must be so many instances of serde_json::from_str::<serde_json::Value>, for instance, within a given dependency tree.

Luckily, in the future / nightly, one can experiment with the -Zshare-generics flag which aims to tackle this very problem.

1 Like

That's good news, hopefully it lands on the stable channel soon.

This would be waste of time, usually. Note that different crates would have different profiles, but if you would actually try to profile compiler itself you would find out that actual monomorphization process is actually pretty cheap. What is usually more expensive is process which happens when generic version is processed and compiler ensures that monomorphization may never fail.

That's usual case, anyway, one can easily imagine situation where you have one generic function and million monomorphized versions but I'm not sure I have ever seen such situation outside of specially created code intended to stress-test the compiler.

libstd++ does something like what you are asking about but that's not because they want to speedup the compilation but because this allows it to move these versions into a shared library.

This is only possible because C++ have stable ABI thus it would be pretty pointless to add such facility to Rust before adding stable ABI to it.