Why can't `rustc` only compile code that is actually used?

Apologies in advance if this is a dumb question as I am, by no means, a compiler expert. I hear a lot of complaints about Rust compile times. I have to admit, I didn't really mind until a recent work project has me waiting around forever to compile a single crate which we use exactly one function from.

This got me thinking, why can't Rust simply be smart enough to realize I am only using a small fraction of said crate and just compile that small segment of code? This seems obvious enough that it would have already been done if there wasn't some major technical difficultly blocking it, so what might that be?

In the meantime, I might just ripout the function from said crate and dump it in our codebase :sweat_smile:

Update

Got a really nice explanation on the community matrix. ← [explanation starts at link and expands across several posts]

1 Like

Uhm.... Which rustc compiler are you talking about?
Crate is a the compilation unit.
That means that first rustc compiler does the compilation of that slow-compileable crate and then another rustc compiler compiles your crate.
First compiler can not do anything but compile everything because it has no idea what would you use.
Second compiler can not do anything because, well, it's not even started before dependency crate compilation is finished this it's too late to do anything after it starts.

I see what you mean, perhaps I should relabel the title to `why can't "cargo|rust" conditionally compile only used code segments? In any case, I believe the intent of the question is obvious enough, if not please let me know and I will try to clarify :slightly_smiling_face: .

We already do this. Only code reachable from the crate's entry points (fn main for a binary, all public functions for a library, more or less) is fully compiled to LLVM IR and handed off to LLVM to be optimized.

Even unused code has to be type-checked though, so it's not possible to skip all compilation steps for unused code.

3 Likes

(sorry for the above, I hit something like Alt followed by space (don't want to test now...) and that posted my message early)

I think this is a legitimate question, and am dissatisfied with some of the previous answers.

On matrix, Nathan responds about Turing Completeness, and says that the question of whether a function is used reduces to the halting problem. However, this is making the problem harder than it needs to be; the OP says they are only using "one function" from a crate.

fn foo() { }
fn bar() { }

fn main() { bar(); }

As alluded to by @jschievink, rustc can already determine that foo() is never called in the above program, and it even emits an "unused function" warning. It can tell this because (a) it's not in the call graph of main[^1], (b) it's not pub , and (c) it's not extern . Given these constraints, any program which did somehow execute the body of the foo must certainly invoke Undefined Behavior.

If you look at the unoptimized LLVM IR of this program (click the "..." next to Run), you'll see that foo isn't even there. It does however appear in the MIR, as the body needs to be type-checked and borrow-checked.

The Matrix discussion also mentioned trait objects. But even for trait objects, when code is statically linked, one can still, at least in theory , see which impls are capable of being used, based on which types are actually coerced into a trait object:

trait Trait {
    fn function(&self);
}

struct NotUsedAsDyn;

struct UsedAsDyn;

impl Trait for NotUsedAsDyn {
    fn function(&self) {}
}
impl Trait for UsedAsDyn {
    fn function(&self) {}
}

fn main() {
    UsedAsDyn.function();
    NotUsedAsDyn.function();
    (&UsedAsDyn as &dyn Trait).function();
}

If you look at the unoptimized LLVM IR of this, you'll see that the following vtable is created for UsedAsDyn :

@vtable.1 = private unnamed_addr constant { void (%UsedAsDyn*)*, i64, i64, void (%UsedAsDyn*)* } { void (%UsedAsDyn*)* @"_ZN4core3ptr42drop_in_place$LT$playground..UsedAsDyn$GT$17h9e746529c8de1b12E", i64 0, i64 1, void (%UsedAsDyn*)* @"_ZN59_$LT$playground..UsedAsDyn$u20$as$u20$playground..Trait$GT$8function17h3b8e7aef9c71deb4E" }, align 8, !dbg !14

But none is created for NotUsedAsDyn , because none of the code in the callgraph of main coerces NotUsedAsDyn into a trait object. (Even if you add an unused fn foo which does this!)


So the issue isn't really that unused code can't be identified. It's that it can't be identified beyond the boundary of a crate. If a crate has a pub function which is not generic, code will be generated for that function, regardless of whether it is used downstream by the final binary or not.

This is where @VorfeedCanal 's response comes in:

That means that first rustc compiler does the compilation of that slow-compileable crate and then another rustc compiler compiles your crate.

But this is a statement of how things are, not necessarily how they need to be. In theory, cargo could ask rustc which public items from a dependency are actually used by its dependents in the build graph, and provide this list as context when building those crates.


So why not do this? Well, I will offer a variety of counter arguments:

  • I've drastically understated the complexity of communicating information about "what is used" between crates, especially in the face of generics.
  • It's not as simple as looking at "the binary." A crate can have multiple binaries; test suites and examples; cargo features. Should we include all of the executables or just what's being built now?
  • Even if it were done, the benefits are limited. Some of the worst offenders in terms of build time tend to be crates with a very small number of public API entry points, whose call graphs cover nearly the entire crate. And it only helps for the first build of these dependencies for a given workspace/rustc version/target, after which dependencies are cached anyways.

[^1]: By "call-graph", I am (trying to) refer to the directed graph whose nodes are functions, and which has edges for any time that one function refers to another function by name. Note that there's some additional complexity regarding generics that I'm basically just hand-waving at this point because it's melting my brain to think about.

11 Likes

The intent may be obvious but you can see why such an “obvious improvement” is close in scope to the full-blown research project.
You, basically, say “hey, I know that's how things were done in last half-century, that's how thousands of books teach to do these things, but computers are different now, maybe we can do better”.
Well… possible. But that's your “major technical difficultly”: changes of such magnitude which go so deep as to completely reshuffle the base of the whole project (cargo↔rustc↔linker interface) don't happen just because someone wants to do a minor improvement to a fringe usecase.

1 Like

You listed somewhat convincing counter-arguments but forgot the most important one: this would make compilation of different crates interdependent.
Currently crates are used like libraries in all languages since FORTRAN times: their relationship is a DAG, they can be compiled entirely independently (even on different compiler nodes in the cloud if project is big), etc.
And, of course, there are thousands of tools built around that DAG assumption (essentially all build systems today are dependent on that).
But to make that optimization viable you have to break all that and introduce some new way of doing all that.
That's… quite a tall order. The benefits have to be immense to overcome half-century inertia of, essentially, the whole industry.
I'm pretty sure more interesting would be adoption of something like ThinLTO, but that, too, is large-scale project. Also: ThinLTO works very well if you have something like 128-cores Threadripper… but not all Rust developers use such hardware for build nodes. Not even sure if ThinLTO would be an optimization or pessimization for the topicstarter.

2 Likes

Rust already supports ThinLTO (and incremental release builds even use it by default across compilation units from the same crate).

Having multi-crate compilation sessions is something that was discussed in the past. They would indeed allow you to precisely know which code the final binaries will need, and also make incremental compilation more efficient, but would also be a very complex long-term effort.

1 Like

This may be a little off-topic, but does code that is unused across crate boundaries get removed from the resulting release binary? What about the debug binary?

Yes. No Possibly also yes, I should check this...

1 Like

Yes, on (I think) all platforms except for windows --gc-sections is used to let the linker remove functions that are known to be unreachable. On windows it is not possible to use this due to the fact that the COFF object format has a limitation on the amount of sections.

1 Like

I never suggested that anyone actually go through all the trouble of doing this. This question was entirely hypothetical for the sake of better understanding:
A. If it could even be done at all
B. If so, then why it isn't currently done this way
C. If doing it would even be quite as beneficial as I imagine

I'd say all 3 points have been thoroughly addressed. Thanks everyone for such thorough explanations!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.