Code gen speed: macros vs traits/generics

Suppose you have some code that can be generated either via traits/generics (provide something that impls A, B, C => can also impl D) or macro_rule! (macro generates the code directly): is there any speed difference between the two ?

Any pro / con ?

1 Like

I assume that a macro is slower in case the function is not inlined (code gen once instead of for each instance). I don't think generics would have big impact.

Other than that it probably depends on how fast macros are generated.

The main difference is where the actual instantiation / monomorphisation happens.

Let's consider a more concrete example:

trait Trait {
    fn foo (array: Self)…
    ;
}

/* Generic approach */
impl<const N: usize> Trait for [u8; N] {
    /// Imagine each takes long to compile
    fn foo (array: [u8; N])…
    {
        …
    }
}

// vs:

/* Macro-based approach */
generate_impls![
    0, 1, 2, 3, 4, 5, 6, 7,
];
// where
macro_rules! generate_impls {( $($N:literal),* $(,)? ) => (
    $(
        impl Trait for [u8; $N] {
            /// Imagine each takes long to compile
            fn foo (array: [u8; $N])…
            {
                …
            }
        }
    )*
)} use generate_impls;

Then, with the macro-based approach, each "instantiation" happens there and then, with code repeated for each of the instantiations (8 in my example). If a downstream user then goes and only uses a few of those impls, then at the end of the day that user (and the dependency!) has paid the compile-time cost of generating all the other impls for nothing.

On the other hand, thanks to incremental compilation, this "eager instantiation" (vs. the "lazy" one of generics), has the advantage of effectively caching that instantiation.

This means that the compilation time is spent once and guaranteed to be cached for all the future rebuilds of the dependent / downstream user.

Tangentially, an aside about generics and "delayed compilation"

An actual real-world example where this does not happen is with {De,}Serialize: the serde crates and the code emitted by its derives all make very heavy usage of generics, which leads to most of the serde-related code being "lazy". Then, when somebody actually calls a {de,}serialize-related function in a downstream crate, they suddenly trigger the actual instantiation of all that generic code for their concrete choices of {De,}Serializers, and so end up spending the compilation time of their dependency.

  • This gets to be slightly problematic when multiple compilation units end up doing this work (I think? I'm not sure of this point).

This, in practice, is not that big of an issue, but for big projects it's somethign to be mindful of. In such cases, trying to define a helper "dummy" intermediary crate with exports non-generic functions for the multiple downstream crates that will be using them can thus be beneficial.


Back to the OP, and speaking from experience for the (degenerate) case of polyfilling const generics with macros when the former weren't stable; the macro / concrete-instantiations-of-all approach did slow down compile-time in a noticeable way, at a point that I'd feature gate the less common array lengths.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.