When are generic instances in a dead branch of the code, pruned?

If we have code that's structured like this:

fn foo<T>() {
    if !std::mem::needs_drop::<T>() { return; }
    
    // More code here, using T.
    let mut v = Vec::<T>::with_capacity(1024);
    // ...
}

For a given type that exits early from this function, like for example i32 (does not need drop), in what part of compilation and when are the generic items that follow in the foo function pruned?

It seems like, from debug mode compilation, that quite a lot of the code in the dead branch will be instantiated and code-generated, even though it should be unreachable.

Does anyone know more details when in compilation this code is removed? I'd want to avoid that it gets emitted for optimization to llvm, so that the compiler doesn't have to do a lot of extra work - for example optimizing Vec::with_capacity for a type that's not even going to be used at that point.

I know that if we had a trait for needs_drop, that it would be possible to statically avoid this extra code generation. Does anyone know other tricks for avoiding problems with this in practice?

The question comes from the the following (more complicated) code (original code is here), that also involves significant code generation with the same needs_drop conditional.

/// Apply and collect the results into a new array, which has the same size as the
/// inputs.
///
/// If all inputs are c- or f-order respectively, that is preserved in the output.
pub fn apply_collect<R>(self, f: impl FnMut(P1::Item, P2::Item) -> R) -> Array<R, D>
    where P1: NdProducer<..>, P2: NdProducer<..>
{
    // Make uninit result
    let mut output = self.uninitalized_for_current_layout::<R>();
    if !std::mem::needs_drop::<R>() {
        // For elements with no drop glue, just overwrite into the array
        self.apply_assign_into(&mut output, f);
    } else {
        // For generic elements, use a proxy that counts the number of filled elements,
        // and can drop the right number of elements on unwinding
        unsafe {
            PartialArray::scope(output.view_mut(), move |partial| {
                debug_assert_eq!(partial.layout().tendency() >= 0, self.layout_tendency >= 0);
                self.apply_assign_into(partial, f);
            });
        }
    }

    unsafe {
        output.assume_init()
    }
}

It is removed by the same thing that would remove the call to bar here:

fn foo() {
    if true { return; }
    bar();
}

Can you be more specific? And your example does not seem to involve instantiation of generic items :slight_smile:

I don't know if it is llvm who removes it, but it probably is. After the generics have been instantiated, it would yield something equivalent to my example.

Yes, LLVM does all dead code elimination, rustc doesn't do dead code elimination to the best of my knowledge.

rustc can do basic dead code elimination in simple cases:

5 Likes

The only way I see to "improve" my code here would be either with specialization (trait selection is the only way to make sure only needed generic items are instantiated) or more aggressive constant propagation in rustc. needs_drop is a const fn, but that doesn't at the moment - if I understand correctly - mean anything special in terms of compile time evaluation in non-const context. If anyone has other ideas, I'm interested.

It seems that there is a difference for debug vs release build in this regard:

This might not make a difference in release builds (where a loop that has no side-effects is easily detected and eliminated), but is often a big win for debug builds.

From:docs

@itemchenko I think @bluss is worried about code-size, not performance in debug builds.

@wesleywiser it looks like that pass only works on literal true and false, so it doesn't evaluate needs_drop in a debug build which is unfortunate.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.