Optimization doesn't seem to optimize fully

use std::time::Instant;

fn main() {
    let mut a: Vec<u64> = Vec::with_capacity(1_000_000);
    let mut b: [u64; 1_000_000] = [0; 1_000_000];
    
    let mut now = Instant::now();
    for i in 1..1_000_001 {
        a.push(i * 8);
    }
    let x1 = now.elapsed();
    println!("{:?}", x1);
    
    now = Instant::now();
    for i in 1..1_000_001 {
        b[i as usize - 1] = i * 8;
    }
    let x2 = now.elapsed();
    println!("{:?}", x2);
    
    std::mem::drop(a);
    a = Vec::new();
    b = [0; 1_000_000];
    
    now = Instant::now();
    for i in 1..1_000_001 {
        a.push(i << 3);
    }
    let x3 = now.elapsed();
    println!("{:?}", x3);
    
    now = Instant::now();
    for i in 1..1_000_001 {
        b[i as usize - 1] = i << 3;
    }
    let x4 = now.elapsed();
    println!("{:?}", x4);
}

(Playground)

Output:

12.145943ms
965ns
13.38689ms
941ns

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished release [optimized] target(s) in 0.77s
     Running `target/release/playground`

here, the vector push seems to take a long time compared to stack array. However, the vector is only being pushed to for a specific constant amount. Is there any way to get the compiler to optimize this stuff so that it is at least in the order of microseconds?

It seems more about the cache than the optimization. Stack memories tends to live within the cache but the heap allocation, especially for the newly allocated one, need to be included to the cache first before write.

Note that this code will only work on main thread on linux. On windows default stack size is 1MB. On linux it's 8MB but the main thread's stack can grow dynamically.

IMO this has to do with the fact that the optimizer can clearly see the stack array won't ever be read, so it entirely removes it and the time you're measuring is just an error between one time measurement and the other. However this optimization can't happen for heap data because the compiler can't make the same assumptions.

For example if you look at the assembly on godbolt here you'll see that the array is entirely removed. In particular look at assembly lines 161-168 which correspond to lines 14 and 18 of your code, you'll see that there's no code corrisponding to lines 15-17 of your input.

6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.