(sorry for the above, I hit something like Alt followed by space (don't want to test now...) and that posted my message early)
I think this is a legitimate question, and am dissatisfied with some of the previous answers.
On matrix, Nathan responds about Turing Completeness, and says that the question of whether a function is used reduces to the halting problem. However, this is making the problem harder than it needs to be; the OP says they are only using "one function" from a crate.
fn foo() { }
fn bar() { }
fn main() { bar(); }
As alluded to by @jschievink, rustc can already determine that foo()
is never called in the above program, and it even emits an "unused function" warning. It can tell this because (a) it's not in the call graph of main
[^1], (b) it's not pub
, and (c) it's not extern
. Given these constraints, any program which did somehow execute the body of the foo
must certainly invoke Undefined Behavior.
If you look at the unoptimized LLVM IR of this program (click the "..." next to Run), you'll see that foo
isn't even there. It does however appear in the MIR, as the body needs to be type-checked and borrow-checked.
The Matrix discussion also mentioned trait objects. But even for trait objects, when code is statically linked, one can still, at least in theory , see which impls are capable of being used, based on which types are actually coerced into a trait object:
trait Trait {
fn function(&self);
}
struct NotUsedAsDyn;
struct UsedAsDyn;
impl Trait for NotUsedAsDyn {
fn function(&self) {}
}
impl Trait for UsedAsDyn {
fn function(&self) {}
}
fn main() {
UsedAsDyn.function();
NotUsedAsDyn.function();
(&UsedAsDyn as &dyn Trait).function();
}
If you look at the unoptimized LLVM IR of this, you'll see that the following vtable
is created for UsedAsDyn
:
@vtable.1 = private unnamed_addr constant { void (%UsedAsDyn*)*, i64, i64, void (%UsedAsDyn*)* } { void (%UsedAsDyn*)* @"_ZN4core3ptr42drop_in_place$LT$playground..UsedAsDyn$GT$17h9e746529c8de1b12E", i64 0, i64 1, void (%UsedAsDyn*)* @"_ZN59_$LT$playground..UsedAsDyn$u20$as$u20$playground..Trait$GT$8function17h3b8e7aef9c71deb4E" }, align 8, !dbg !14
But none is created for NotUsedAsDyn
, because none of the code in the callgraph of main
coerces NotUsedAsDyn
into a trait object. (Even if you add an unused fn foo
which does this!)
So the issue isn't really that unused code can't be identified. It's that it can't be identified beyond the boundary of a crate. If a crate has a pub
function which is not generic, code will be generated for that function, regardless of whether it is used downstream by the final binary or not.
This is where @VorfeedCanal 's response comes in:
That means that first rustc
compiler does the compilation of that slow-compileable crate and then another rustc
compiler compiles your crate.
But this is a statement of how things are, not necessarily how they need to be. In theory, cargo could ask rustc which public items from a dependency are actually used by its dependents in the build graph, and provide this list as context when building those crates.
So why not do this? Well, I will offer a variety of counter arguments:
- I've drastically understated the complexity of communicating information about "what is used" between crates, especially in the face of generics.
- It's not as simple as looking at "the binary." A crate can have multiple binaries; test suites and examples; cargo features. Should we include all of the executables or just what's being built now?
- Even if it were done, the benefits are limited. Some of the worst offenders in terms of build time tend to be crates with a very small number of public API entry points, whose call graphs cover nearly the entire crate. And it only helps for the first build of these dependencies for a given workspace/rustc version/target, after which dependencies are cached anyways.
[^1]: By "call-graph", I am (trying to) refer to the directed graph whose nodes are functions, and which has edges for any time that one function refers to another function by name. Note that there's some additional complexity regarding generics that I'm basically just hand-waving at this point because it's melting my brain to think about.