I suspect that not using generics may well speed up compilation. I also suspect that not using all manner of other Rust features may well speed up compilation. That is only to be expected. I have never done any timing comparisons. If anyone can contribute actual measurements that might be interesting.
so now we have the problem that Foo itself is generating lots of code due to the procedural macros, but then we do it once per Tag we instantiate.
This seems to quickly blowup and become really expensive.
It appears, from cargo build --timings, that the solution is to jump back to pub struct Foo {}; pub struct Bar {} and give up the checks.
One thing that is bothering me right now is that the cost of doing these cool generic / compile time checks is more than the check itself -- but also all this extra code generation.
Can you present that as an actual example that compiles so that we can experience this for ourselves? Preferably something that does something realistic.
I think in these cases you should be able to solve the issue by having an internal tag-less struct which gets wrapped by the tagged one. All operations are defined on the tagless struct (hence compiled only once, no duplication due to generics) and the tagged ones can just forward to the tagless methods. You'll still pay for typechecking, but that's what you wanted in the first place, no?
Sorry, I don't have a minimal example. The issues arose from starting with a giant ball of mud workspace, shattering the crates into smaller pieces, trying to make them more parallelizable, and noticing that some tiny crates take a long time to codegen (that drastically improved after removing the tagless technique.)
Ideally, the Rust compiler would be able to “polymorphize” — recognize when generic code is all or largely unaffected by the generic and compile it only once to a form that works for all versions. However, it currently doesn't.
Another situation this comes up in is in functions with simple barely-used generics:
fn foo<T: Into<String>>(input: T) {
let input = input.into();
// many more lines of code ....
}
In this case, you can improve compile time and code size by making the non-generic part a separate, non-generic function:
Another thing to do is to keep an eye out for opportunities to use &dyn Trait as a function parameter instead of generics. This of course has the added cost of run-time dispatch, but if the code is relatively uncommonly called then the cost can be unnoticeable.
Besides compile time, multiple instantiations of generic code also increases the size of the resulting binary. (Though in the case where the machine code is byte-for-byte identical, LLVM or the linker (I forget which) will deduplicate them. This is why in stack traces you sometimes see a completely unrelated type's function listed — because it's identical to the one actually called, so there's only one copy of the code, so only one address for both.)
You'll see this particularly often in the standard library for things using paths. For example,
You can call that with a &str or a String or a PathBuf, but it would be silly to instantiate all the file-reading parts again for each of those.
So instead the standard library has a concrete version that only takes &Path, and the monomorphized part is just the trivial inlinable wrapper that turns the &str to a &Path or similar.
Note that generics are often a big win if what you care about is how much the macro has to generate. Because proc-macro parsing and emitting for one generic struct is much cheaper than making the proc-macro look at all the concrete types.
When it comes to codegen stuff, 3 different concrete types or 3 instantiations of one generic type behave about the same, other than in where it happens in your crate graph. There's nothing about things like debuginfo generation that cares whether it's an orginally-concrete type or an instantiation of a generic.
I get the distinction between (1) generating code via a proc macro and (2) writing a generic piece of code, which when instantiated with a concrete type, generates code.
I do not understand this sentence:
Because proc-macro parsing and emitting for one generic struct is much cheaper than making the proc-macro look at all the concrete types.
How is "cost" being measured here ?
I think there is something else going on here, with benefit going to proc macro.
I believe World2 involves some type of "duplicate work" that World1 avoids. I'm not sure what the term/phrase of this is, but "by making stuff more concrete earlier, we avoid duplicate work of instantiating w/ the same types in DIFFERENT crates" ?