Soft question: Significantly improve Rust compile time via minimizing Generics?

anon80458984 · December 8, 2023, 7:14am

Is the trick to massively speed up Rust compile time to minimize usage of Generics ?

For example, one smart type safety trick in Rust is to do something like:

pub struct <T: ...> SomeBuffer {
  data: Vec<u8>,
  _t: PhantomData<T>
}

where the point of this T here is to solely have a list of "static tags" to make sure things match up at compile time.

However, this seems to often really explode compile time, and dropping the T often improves compile time drastically.

In short: to get Rust to compile fast, use it like GoLang and minimize generics ?

ZiCog · December 8, 2023, 8:27am

I suspect that not using generics may well speed up compilation. I also suspect that not using all manner of other Rust features may well speed up compilation. That is only to be expected. I have never done any timing comparisons. If anyone can contribute actual measurements that might be interesting.

anon80458984 · December 8, 2023, 8:31am

Here is the process I went through.

I start out with

pub struct Foo {};
pub struct Bar {};

impl Foo {
  fn do_stuff(&self, b: &Bar); 
}

then there's the idea we can gain some type safety by doing

pub trait Tag {}
pub struct Foo<T: Tag> { ... }
pub struct Bar<T: Tag> { ... }

impl<T: Tag> Foo {
  fn do_stuff(&self, b: &Bar<T>);
}

and this has the benefit we can do Foo<Apple>.do_stuff(Bar<Apple>) but we can't do Foo<Apple>.do_stuff(Bar<Orange>). Great. Yay compile time checks.

So now imagine that Foo actually looks like:

#[derive(A, B, C, D, E, F, G)]
pub struct Foo<T: Tag> { ... }

so now we have the problem that Foo itself is generating lots of code due to the procedural macros, but then we do it once per Tag we instantiate.

This seems to quickly blowup and become really expensive.

It appears, from cargo build --timings, that the solution is to jump back to pub struct Foo {}; pub struct Bar {} and give up the checks.

One thing that is bothering me right now is that the cost of doing these cool generic / compile time checks is more than the check itself -- but also all this extra code generation.

ZiCog · December 8, 2023, 9:07am

Can you present that as an actual example that compiles so that we can experience this for ourselves? Preferably something that does something realistic.

SkiFire13 · December 8, 2023, 11:54am

I think in these cases you should be able to solve the issue by having an internal tag-less struct which gets wrapped by the tagged one. All operations are defined on the tagless struct (hence compiled only once, no duplication due to generics) and the tagged ones can just forward to the tagless methods. You'll still pay for typechecking, but that's what you wanted in the first place, no?

anon80458984 · December 8, 2023, 3:13pm

Sorry, I don't have a minimal example. The issues arose from starting with a giant ball of mud workspace, shattering the crates into smaller pieces, trying to make them more parallelizable, and noticing that some tiny crates take a long time to codegen (that drastically improved after removing the tagless technique.)

kpreid · December 8, 2023, 3:36pm

Ideally, the Rust compiler would be able to “polymorphize” — recognize when generic code is all or largely unaffected by the generic and compile it only once to a form that works for all versions. However, it currently doesn't.

Another situation this comes up in is in functions with simple barely-used generics:

fn foo<T: Into<String>>(input: T) {
    let input = input.into();
    // many more lines of code ....
}

In this case, you can improve compile time and code size by making the non-generic part a separate, non-generic function:

fn foo<T: Into<String>>(input: T) {
    fn inner(input: String) {
        // ...
    }
    inner(input.into())
}

Another thing to do is to keep an eye out for opportunities to use &dyn Trait as a function parameter instead of generics. This of course has the added cost of run-time dispatch, but if the code is relatively uncommonly called then the cost can be unnoticeable.

Besides compile time, multiple instantiations of generic code also increases the size of the resulting binary. (Though in the case where the machine code is byte-for-byte identical, LLVM or the linker (I forget which) will deduplicate them. This is why in stack traces you sometimes see a completely unrelated type's function listed — because it's identical to the one actually called, so there's only one copy of the code, so only one address for both.)

scottmcm · December 8, 2023, 6:50pm

You'll see this particularly often in the standard library for things using paths. For example,

github.com

rust-lang/rust/blob/ae612bedcbfc7098d1711eb35bc7ca994eb17a4c/library/std/src/fs.rs#L295-L304


      
          pub fn read_to_string<P: AsRef<Path>>(path: P) -> io::Result<String> {
              fn inner(path: &Path) -> io::Result<String> {
                  let mut file = File::open(path)?;
                  let size = file.metadata().map(|m| m.len() as usize).ok();
                  let mut string = String::with_capacity(size.unwrap_or(0));
                  io::default_read_to_string(&mut file, &mut string, size)?;
                  Ok(string)
              }
              inner(path.as_ref())
          }

You can call that with a &str or a String or a PathBuf, but it would be silly to instantiate all the file-reading parts again for each of those.

So instead the standard library has a concrete version that only takes &Path, and the monomorphized part is just the trivial inlinable wrapper that turns the &str to a &Path or similar.

Note that generics are often a big win if what you care about is how much the macro has to generate. Because proc-macro parsing and emitting for one generic struct is much cheaper than making the proc-macro look at all the concrete types.

When it comes to codegen stuff, 3 different concrete types or 3 instantiations of one generic type behave about the same, other than in where it happens in your crate graph. There's nothing about things like debuginfo generation that cares whether it's an orginally-concrete type or an instantiation of a generic.

anon80458984 · December 8, 2023, 10:01pm

I get the distinction between (1) generating code via a proc macro and (2) writing a generic piece of code, which when instantiated with a concrete type, generates code.

I do not understand this sentence:

Because proc-macro parsing and emitting for one generic struct is much cheaper than making the proc-macro look at all the concrete types.

How is "cost" being measured here ?

I think there is something else going on here, with benefit going to proc macro.

Consider the following situation:

World 1:

crate_header:
defines types Foo_A, Foo_B, Foo_C, Foo_D, Foo_E

crate_1:
crate_2:
crate_3:

World 2

crate_header: defines Foo<T>

crate_1, crate_2, crate_3: instantiates Foo<A>, Foo<B>, Foo<C>, ...

I believe World2 involves some type of "duplicate work" that World1 avoids. I'm not sure what the term/phrase of this is, but "by making stuff more concrete earlier, we avoid duplicate work of instantiating w/ the same types in DIFFERENT crates" ?

system · March 7, 2024, 10:02pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Minimizing compile times via concrete trait implementations help	6	474	December 3, 2022
Can generic arguments reduce compile times?	5	692	April 16, 2022
Stable rust: specialize generics	5	593	May 18, 2024
Generic functions, compile time	4	612	September 15, 2019
Rustc taking very long to compile chained method calls on trait help	2	1154	January 12, 2023

Soft question: Significantly improve Rust compile time via minimizing Generics?

Related topics