Or I found a way to drop code performance without doing anything stupid.
So consider this snippet, that solves Max subarray problem in linear time:
pub fn max_subarray_bad(arr: &[i32]) -> (usize, usize, i32)
{
let prefixes = arr
.iter()
.enumerate()
.scan((0, 0), |s, (i, v)| {
if s.1 > 0 {
s.1 = s.1 + *v;
} else {
*s = (i, *v);
}
Some(*s)
});
let (right_idx, (left_idx, sum)) = prefixes
.enumerate()
.max_by_key(|&(_, (_, sum))| sum)
.unwrap();
(left_idx, right_idx + 1, sum)
}
Since it's execution time doesn't depend on actual data in array, we can easily benchmark it by providing a sized N vector filled with zeros like this:
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_linear(c: &mut Criterion) {
const N: usize = 1000000;
c.bench_function(&format!("max_subarray([..]) N = {}", N), |b| {
b.iter(|| max_subarray::max_subarray_bad(black_box(&vec![0; N])))
});
}
criterion_group!(benches, benchmark_linear);
criterion_main!(benches);
We use a criterion crate for benchmark, the output of cargo bench
would be
Benchmarking max_subarray([..]) N = 1000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.6s or reduce sample count to 30.
max_subarray([..]) N = 1000000
time: [3.2324 ms 3.2700 ms 3.3141 ms]
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
Running target\release\deps\scratch-b68a42551ab01289.exe
Ok, now let's change our max_subarray_bad
by binding 0 to zro
and use it in closure instead of 0 literal const:
pub fn max_subarray_bad(arr: &[i32]) -> (usize, usize, i32)
{
let zro = 0;
let prefixes = arr
.iter()
.enumerate()
.scan((0, 0), |s, (i, v)| {
if s.1 > zro {
s.1 = s.1 + *v;
} else {
*s = (i, *v);
}
Some(*s)
});
let (right_idx, (left_idx, sum)) = prefixes
.enumerate()
.max_by_key(|&(_, (_, sum))| sum)
.unwrap();
(left_idx, right_idx + 1, sum)
}
Surprisingly our benchmark would greatly improve:
Benchmarking max_subarray([..]) N = 1000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 12.9s or reduce sample count to 40.
max_subarray([..]) N = 1000000
time: [2.5705 ms 2.5806 ms 2.5913 ms]
change: [-20.260% -19.668% -19.124%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
Running target\release\deps\scratch-b68a42551ab01289.exe
Almost 20% performance increase! And we just replaced a literal 0 in s.1 > 0
with a name, that binds to 0: s.1 > zro
. You can change code back and forth by replacing 0 and zro in > zro
and you allways get ~20% performance difference. And I checked that it is reprodusable both in stable and nightly. Is it a bug in optimizer? Or I miss something at how closures actually treat literal values?
By the way, if we change let zro = 0
into const zro: i32 = 0
it results in performance drop too.
Update: I also created an issue on github on main rust-lang repo.