Problem space
Hi. For some time now, when I've heard or read any discussion about proc-macros in Rust I would often see complains that they are slow. One of cited reasons for that was that they are compiled in debug mode. And I remember seeing some discussions about how to solve this problem. I don't have a very good understanding on how the compiler works, and what design choices would need to be made. I left reading those discussions with idea that solving this problem is hard (for some reasons) and while many people would like to see this improved, the work needed for it won't be done for some time.
But yesterday by a sheer coincidence, I've found that actually cargo allows you to compile proc-macros in release mode in some way. It turns out, that one can override cargo profile for specific packages. As the documentation puts it:
Profile settings can be overridden for specific packages and build-time crates.
This means, that even in debug builds, one can compile specific packages with enabled optimizations. Taking example from aforementioned documentation:
# The `foo` package will use the -Copt-level=3 flag.
[profile.dev.package.foo]
opt-level = 3
And what's more, there is special table, that controls optimizations for "build scripts, proc macros, and their dependencies":
# Set the settings for build scripts and proc-macros.
[profile.dev.build-override]
opt-level = 3
Experiment
After seeing this, I knew that I need to test this myself, to see if this really works. I've decided to see how different profile settings would influence compilation of a bunch of structs with serde
's proc-macros. I've decided to auto-generate file with random structs, which have random fields and random serde
's attribute applied to them. Below is a very ad-hoc script that generates such file, if anyone would like to reproduce my tests.
code
// dependencies: anyhow, rand and uuid
use std::fmt;
use std::fs::File;
use std::io::{BufWriter, Write};
use rand::seq::SliceRandom;
use rand::thread_rng;
fn random_from_list(list: &[&str]) -> String {
let mut rng = thread_rng();
(*list.choose(&mut rng).unwrap()).to_owned()
}
struct SerdeStructAttrib(String);
struct SerdeFieldAttrib(String);
struct Field {
ident: String,
type_: String,
serde_attribs: Vec<SerdeFieldAttrib>,
}
impl Field {
fn random() -> Self {
let ident = Self::random_ident();
let type_ = Self::random_type();
let serde_attribs = vec![Self::random_attrib()];
Self {
ident,
type_,
serde_attribs,
}
}
fn random_ident() -> String {
let id = uuid::Uuid::new_v4();
format!("f_{}", id.simple())
}
fn random_type() -> String {
static TYPES: &[&str] = &[
"i32",
"u32",
"bool",
"std::string::String",
"std::vec::Vec<std::string::String>",
];
random_from_list(TYPES)
}
fn random_attrib() -> SerdeFieldAttrib {
let attrib: &[&dyn Fn() -> String] = &[
&|| "default".into(),
&|| "skip".into(),
&|| {
let id = uuid::Uuid::new_v4();
format!("rename = \"f_{}\"", id.simple())
},
&|| {
let id = uuid::Uuid::new_v4();
format!("alias = \"f_{}\"", id.simple())
},
];
let mut rng = thread_rng();
let f = attrib.choose(&mut rng).unwrap();
SerdeFieldAttrib(f())
}
}
struct Struct {
ident: String,
serde_attribs: Vec<SerdeStructAttrib>,
fields: Vec<Field>,
}
impl Struct {
pub fn random(fields: usize) -> Self {
let ident = Self::random_ident();
let serde_attribs = vec![Self::random_attrib()];
let fields = (0..fields).into_iter().map(|_| Field::random()).collect();
Self {
ident,
serde_attribs,
fields,
}
}
pub fn to_display(&self) -> StructDisplay<'_> {
StructDisplay(self)
}
fn random_ident() -> String {
let id = uuid::Uuid::new_v4();
format!("S_{}", id.simple())
}
fn random_attrib() -> SerdeStructAttrib {
let attrib: &[&dyn Fn() -> String] = &[
&|| "deny_unknown_fields".into(),
&|| {
let id = uuid::Uuid::new_v4();
format!("rename = \"S_{}\"", id.simple())
},
&|| {
static VALUES: &[&str] = &[
"lowercase",
"UPPERCASE",
"PascalCase",
"camelCase",
"snake_case",
"SCREAMING_SNAKE_CASE",
"kebab-case",
"SCREAMING-KEBAB-CASE",
];
format!("rename_all = \"{}\"", random_from_list(VALUES))
},
];
let mut rng = thread_rng();
let f = attrib.choose(&mut rng).unwrap();
SerdeStructAttrib(f())
}
}
struct StructDisplay<'a>(&'a Struct);
impl fmt::Display for StructDisplay<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
writeln!(f, "#[derive(serde::Serialize, serde::Deserialize)]")?;
if let Some(first) = self.0.serde_attribs.first() {
write!(f, "#[serde({}", first.0)?;
for attrib in &self.0.serde_attribs[1..] {
write!(f, ", {}", attrib.0)?;
}
writeln!(f, ")]")?;
}
writeln!(f, "pub struct {} {{", self.0.ident)?;
for field in &self.0.fields {
if let Some(first) = field.serde_attribs.first() {
write!(f, "\t#[serde({}", first.0)?;
for attrib in &field.serde_attribs[1..] {
write!(f, ", {}", attrib.0)?;
}
writeln!(f, ")]")?;
}
writeln!(f, " pub {}: {},", field.ident, field.type_)?;
}
writeln!(f, "}}")?;
Ok(())
}
}
fn main() -> anyhow::Result<()> {
const STRUCT_N: usize = 1_000;
const FIELD_N: usize = 10;
const PATH: &str = "src/data.rs";
let file = File::create(PATH)?;
let mut file = BufWriter::new(file);
writeln!(file, "#![allow(non_camel_case_types, unused)]\n")?;
for _ in 0..STRUCT_N {
let s = Struct::random(FIELD_N);
write!(file, "{}\n", s.to_display())?;
}
file.flush()?;
Ok(())
}
Generated file had 1000 structs, each of them with 10 fields, each field and struct had a random attribute and was 1,1 Mb in size. I then timed following scenarios:
- Build time, without any serde attributes (and without deriving
Serialize
andDeserialize
). - Build time with serde attributes and clean
Cargo.toml
. - Build time with configuring only
serde_derive
package[profile.{profile}.package.serde_derive] opt-level = 3
- Build time with configuring
serde_derive
package and its dependencies:[profile.{profile}.package.serde_derive] opt-level = 3 [profile.{profile}.package.syn] opt-level = 3 [profile.{profile}.package.proc-macro2] opt-level = 3 [profile.{profile}.package.quote] opt-level = 3 [profile.{profile}.package.unicode-ident] opt-level = 3
- Build time with
[profile.{profile}.build-override] opt-level = 3
All of those scenarios I've run twice, with both debug
and release
profiles. In each scenario I've done 10 measurements and used value reported by cargo
itself as duration. I've always measured incremental builds (touch src/generate_file.rs && cargo build
).
Results
In following results when I calculate difference I always compare average for given test scenario with average of (corresponding debug/release
) scenario without any profile customization.
Scenario | Average duration [s] | Standard deviation [s] | Absolute difference [s] | Relative difference |
---|---|---|---|---|
dev profile without serde | 0.21 | 0.003 | N/A | N/A |
dev profile without customization | 9.55 | 0.17 | 0.0 | 1.0 |
dev profile with optimized serde_derive | 8.79 | 0.08 | -0.76 | 0.92 |
dev profile with optimized serde_derive and its dependencies | 8.43 | 0.09 | -1.12 | 0.88 |
dev profile with build-override optimized | 7.53 | 0.11 | -2.02 | 0.79 |
release profile without serde | 0.16 | 0.005 | N/A | N/A |
release profile without customization | 32.40 | 0.25 | 0.0 | 1.0 |
release profile with optimized serde_derive | 32.47 | 0.13 | 0.07 | 1.002 |
release profile with optimized serde_derive and its dependencies | 31.11 | 0.14 | -1.29 | 0.96 |
release profile with build-override optimized | 31.05 | 0.16 | -1.35 | 0.96 |
Interpretation and questions
From what I can see, turning build-override
optimization has a very noticeable effect in debug
builds and is better than enabling optimization for all individual crates that are used in proc-macro. In the release
builds turning build-override
optimization has a very minor performance improvement, and this time there is no meaningful difference between this and optimizing individual crates.
I am by no mean a statistician, so if you see any problems with my methodology, please let me know. I know that measuring performance is very hard and this is very primitive test.
Now assuming that those results are at least slightly representative, I would like to ask a couple of questions.
- Is this it? Is customizing
build-override
profile the way to get fast proc-macros? - Is this as good as it gets, or is there some work being done or planned, that would improve speed of proc-macros even more?
- Is this the same situation with
release
profile containing by default debug symbols? One could always configurerelease
profile tostrip
them, but since it wasn't on by default, and many people didn't know they can fix it and complained about bloated binaries? - How come this isn't popular knowledge? Maybe I am the only one that didn't know about this option, but I can't remember to see this advice anywhere when reading discussions about proc-macros, or in any Rust optimization guide.