Indexed vs non-indexed access performance in debug vs release mode


#1

Hi,
In debug mode indexed vector access (e.g. “for i in 0…vector.len()”) vs non-indexed (e.g. “for item in &vector”) is 2 slower while in release mode there’s no difference. Any idea why no difference in release mode?

extern crate stopwatch;
use stopwatch::{Stopwatch};

fn main() {
	let size = 10000000;
	
	let nums = vec![5; size];
	let mut sum: i64 = 0;
	let mut times: i64 = 0;
	
	let timer = Stopwatch::start_new();
	
	for num in &nums {
		sum += *num;
		times += 1;
	}
	
	println!("Non Indexed. Sum: {}, {} times, ms: {}", sum, times,
	 timer.elapsed_ms());
	
	let timer = Stopwatch::start_new();
	times = 0;
	sum = 0;
	
	for i in 0..nums.len() {
		sum += nums[i];
		times += 1;
	}
	
	println!("Indexed. Sum: {}, {} times, ms: {}", sum, times,
	 timer.elapsed_ms());
	
	
	let timer = Stopwatch::start_new();
	times = 0;
	sum = 0;
	
	for num in &nums {
		sum += *num;
		times += 1;
	}
	
	println!("Non Indexed. Sum: {}, {} times, ms: {}", sum, times,
	 timer.elapsed_ms());
}

In debug mode yields:
Non Indexed. Sum: 50000000, 10000000 times, ms: 472
Indexed. Sum: 50000000, 10000000 times, ms: 1044
Non Indexed. Sum: 50000000, 10000000 times, ms: 479

In release mode:
Non Indexed. Sum: 50000000, 10000000 times, ms: 8
Indexed. Sum: 50000000, 10000000 times, ms: 8
Non Indexed. Sum: 50000000, 10000000 times, ms: 8

Rust 1.20, Linux, amd64


#2

Wouldn’t that be bounds checking, which only occurs on indexed access and can often be optimized away by the compiler when optimizations are applied in release mode?


#3

I’m no expert, but if the bounds checking is optimized away (in release mode) does it mean it doesn’t perform bounds checking?


#4

One easy optimization is that if you solely check that the first and last indices are in range, you can trivially infer that all of your remaining indices are in range. This gets bounds checking overhead from O(N) to O(1), which often makes it negligible. More sophisticated program analysis can then move these two remaining checks to compile time if it turns out that the first and last indices are known then, reducing the run-time overhead to zero.


#5

Thanks, makes sense to me, I hope this is what’s actually happening.


#6

That’s what is it happening, LLVM is able to remove bound tests in basic cases like this. Your hope could be replaced by taking a look at the asm produced by rustc (with --emit asm) if you know a bit of asm.


#7

In an optimized build, it would’ve been nice if LLVM is able to constant fold the whole thing given the vec size and values are constants local to that function.