We had fast proc macros all the time, or had we?

Problem space

Hi. For some time now, when I've heard or read any discussion about proc-macros in Rust I would often see complains that they are slow. One of cited reasons for that was that they are compiled in debug mode. And I remember seeing some discussions about how to solve this problem. I don't have a very good understanding on how the compiler works, and what design choices would need to be made. I left reading those discussions with idea that solving this problem is hard (for some reasons) and while many people would like to see this improved, the work needed for it won't be done for some time.

But yesterday by a sheer coincidence, I've found that actually cargo allows you to compile proc-macros in release mode in some way. It turns out, that one can override cargo profile for specific packages. As the documentation puts it:

Profile settings can be overridden for specific packages and build-time crates.

This means, that even in debug builds, one can compile specific packages with enabled optimizations. Taking example from aforementioned documentation:

# The `foo` package will use the -Copt-level=3 flag.
[profile.dev.package.foo]
opt-level = 3

And what's more, there is special table, that controls optimizations for "build scripts, proc macros, and their dependencies":

# Set the settings for build scripts and proc-macros.
[profile.dev.build-override]
opt-level = 3

Experiment

After seeing this, I knew that I need to test this myself, to see if this really works. I've decided to see how different profile settings would influence compilation of a bunch of structs with serde's proc-macros. I've decided to auto-generate file with random structs, which have random fields and random serde's attribute applied to them. Below is a very ad-hoc script that generates such file, if anyone would like to reproduce my tests.

code
// dependencies: anyhow, rand and uuid

use std::fmt;
use std::fs::File;
use std::io::{BufWriter, Write};

use rand::seq::SliceRandom;
use rand::thread_rng;

fn random_from_list(list: &[&str]) -> String {
    let mut rng = thread_rng();
    (*list.choose(&mut rng).unwrap()).to_owned()
}

struct SerdeStructAttrib(String);
struct SerdeFieldAttrib(String);

struct Field {
    ident: String,
    type_: String,
    serde_attribs: Vec<SerdeFieldAttrib>,
}

impl Field {
    fn random() -> Self {
        let ident = Self::random_ident();
        let type_ = Self::random_type();
        let serde_attribs = vec![Self::random_attrib()];

        Self {
            ident,
            type_,
            serde_attribs,
        }
    }

    fn random_ident() -> String {
        let id = uuid::Uuid::new_v4();
        format!("f_{}", id.simple())
    }

    fn random_type() -> String {
        static TYPES: &[&str] = &[
            "i32",
            "u32",
            "bool",
            "std::string::String",
            "std::vec::Vec<std::string::String>",
        ];
        random_from_list(TYPES)
    }

    fn random_attrib() -> SerdeFieldAttrib {
        let attrib: &[&dyn Fn() -> String] = &[
            &|| "default".into(),
            &|| "skip".into(),
            &|| {
                let id = uuid::Uuid::new_v4();
                format!("rename = \"f_{}\"", id.simple())
            },
            &|| {
                let id = uuid::Uuid::new_v4();
                format!("alias = \"f_{}\"", id.simple())
            },
        ];

        let mut rng = thread_rng();
        let f = attrib.choose(&mut rng).unwrap();
        SerdeFieldAttrib(f())
    }
}

struct Struct {
    ident: String,
    serde_attribs: Vec<SerdeStructAttrib>,
    fields: Vec<Field>,
}

impl Struct {
    pub fn random(fields: usize) -> Self {
        let ident = Self::random_ident();
        let serde_attribs = vec![Self::random_attrib()];
        let fields = (0..fields).into_iter().map(|_| Field::random()).collect();

        Self {
            ident,
            serde_attribs,
            fields,
        }
    }

    pub fn to_display(&self) -> StructDisplay<'_> {
        StructDisplay(self)
    }

    fn random_ident() -> String {
        let id = uuid::Uuid::new_v4();
        format!("S_{}", id.simple())
    }

    fn random_attrib() -> SerdeStructAttrib {
        let attrib: &[&dyn Fn() -> String] = &[
            &|| "deny_unknown_fields".into(),
            &|| {
                let id = uuid::Uuid::new_v4();
                format!("rename = \"S_{}\"", id.simple())
            },
            &|| {
                static VALUES: &[&str] = &[
                    "lowercase",
                    "UPPERCASE",
                    "PascalCase",
                    "camelCase",
                    "snake_case",
                    "SCREAMING_SNAKE_CASE",
                    "kebab-case",
                    "SCREAMING-KEBAB-CASE",
                ];
                format!("rename_all = \"{}\"", random_from_list(VALUES))
            },
        ];

        let mut rng = thread_rng();
        let f = attrib.choose(&mut rng).unwrap();
        SerdeStructAttrib(f())
    }
}

struct StructDisplay<'a>(&'a Struct);

impl fmt::Display for StructDisplay<'_> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        writeln!(f, "#[derive(serde::Serialize, serde::Deserialize)]")?;
        if let Some(first) = self.0.serde_attribs.first() {
            write!(f, "#[serde({}", first.0)?;
            for attrib in &self.0.serde_attribs[1..] {
                write!(f, ", {}", attrib.0)?;
            }
            writeln!(f, ")]")?;
        }
        writeln!(f, "pub struct {} {{", self.0.ident)?;
        for field in &self.0.fields {
            if let Some(first) = field.serde_attribs.first() {
                write!(f, "\t#[serde({}", first.0)?;
                for attrib in &field.serde_attribs[1..] {
                    write!(f, ", {}", attrib.0)?;
                }
                writeln!(f, ")]")?;
            }
            writeln!(f, "    pub {}: {},", field.ident, field.type_)?;
        }
        writeln!(f, "}}")?;

        Ok(())
    }
}

fn main() -> anyhow::Result<()> {
    const STRUCT_N: usize = 1_000;
    const FIELD_N: usize = 10;
    const PATH: &str = "src/data.rs";

    let file = File::create(PATH)?;
    let mut file = BufWriter::new(file);

    writeln!(file, "#![allow(non_camel_case_types, unused)]\n")?;

    for _ in 0..STRUCT_N {
        let s = Struct::random(FIELD_N);
        write!(file, "{}\n", s.to_display())?;
    }

    file.flush()?;
    Ok(())
}

Generated file had 1000 structs, each of them with 10 fields, each field and struct had a random attribute and was 1,1 Mb in size. I then timed following scenarios:

  1. Build time, without any serde attributes (and without deriving Serialize and Deserialize).
  2. Build time with serde attributes and clean Cargo.toml.
  3. Build time with configuring only serde_derive package
    [profile.{profile}.package.serde_derive]
    opt-level = 3
    
  4. Build time with configuring serde_derive package and its dependencies:
    [profile.{profile}.package.serde_derive]
    opt-level = 3
    
    [profile.{profile}.package.syn]
    opt-level = 3
    
    [profile.{profile}.package.proc-macro2]
    opt-level = 3
    
    [profile.{profile}.package.quote]
    opt-level = 3
    
    [profile.{profile}.package.unicode-ident]
    opt-level = 3
    
  5. Build time with
    [profile.{profile}.build-override]
    opt-level = 3
    

All of those scenarios I've run twice, with both debug and release profiles. In each scenario I've done 10 measurements and used value reported by cargo itself as duration. I've always measured incremental builds (touch src/generate_file.rs && cargo build).

Results

In following results when I calculate difference I always compare average for given test scenario with average of (corresponding debug/release) scenario without any profile customization.

Scenario Average duration [s] Standard deviation [s] Absolute difference [s] Relative difference
dev profile without serde 0.21 0.003 N/A N/A
dev profile without customization 9.55 0.17 0.0 1.0
dev profile with optimized serde_derive 8.79 0.08 -0.76 0.92
dev profile with optimized serde_derive and its dependencies 8.43 0.09 -1.12 0.88
dev profile with build-override optimized 7.53 0.11 -2.02 0.79
release profile without serde 0.16 0.005 N/A N/A
release profile without customization 32.40 0.25 0.0 1.0
release profile with optimized serde_derive 32.47 0.13 0.07 1.002
release profile with optimized serde_derive and its dependencies 31.11 0.14 -1.29 0.96
release profile with build-override optimized 31.05 0.16 -1.35 0.96

Interpretation and questions

From what I can see, turning build-override optimization has a very noticeable effect in debug builds and is better than enabling optimization for all individual crates that are used in proc-macro. In the release builds turning build-override optimization has a very minor performance improvement, and this time there is no meaningful difference between this and optimizing individual crates.

I am by no mean a statistician, so if you see any problems with my methodology, please let me know. I know that measuring performance is very hard and this is very primitive test.

Now assuming that those results are at least slightly representative, I would like to ask a couple of questions.

  1. Is this it? Is customizing build-override profile the way to get fast proc-macros?
  2. Is this as good as it gets, or is there some work being done or planned, that would improve speed of proc-macros even more?
  3. Is this the same situation with release profile containing by default debug symbols? One could always configure release profile to strip them, but since it wasn't on by default, and many people didn't know they can fix it and complained about bloated binaries?
  4. How come this isn't popular knowledge? Maybe I am the only one that didn't know about this option, but I can't remember to see this advice anywhere when reading discussions about proc-macros, or in any Rust optimization guide.
6 Likes

Cargo can give you finer grained build time info with cargo build --timings.

And another possible way to get fast proc-macros is to "just compile less code". E.g., run your tests again with both miniserde and nanoserde.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.