Hi everyone,
I'm experimenting with rewriting some C# code in Rust and comparing the performance. Some things are much faster in Rust, but this simple function seems quite a bit slower and I'm wondering if anyone has suggestions why. All I'm doing is iterating through an array of f64
s and summing up all the values at an index = 19 mod 500
. I'm surprised because my rust code is coming in over 3x slower than the C# code.
Here's the original C# function:
public class ArrayIter
{
double[] smallArr;
[GlobalSetup]
public void Setup()
{
smallArr = new double[50000];
for(int i = 0; i < 50000; i++)
{
smallArr[i] = 5.0;
}
}
[Benchmark]
public double GetSumSmaller()
{
var total = 0.0;
for(int i = 0; i < 100; i++)
{
total += smallArr[i * 500 + 19];
}
return total;
}
}
and here's the two attempts I made to write the rust equivalent. The first is exactly the same, while the second groups the array into groups of 500 and iterates over the groups (note that I'm not black_box
ing the index 19 because it's a fixed const in the actual code we run).
use criterion::{criterion_group, criterion_main, BatchSize, Criterion};
fn small_benchmarks(c: &mut Criterion) {
c.bench_function("fold like c#", |b| {
b.iter_batched(
|| vec![5.0; 50000],
|array| {
let mut total = 0.0;
for i in 0..100 {
total += array[500 * i + 19]
}
total
},
BatchSize::LargeInput,
)
});
c.bench_function("fold with iter", |b| {
b.iter_batched(
|| vec![[5.0; 500]; 100],
|array| array.iter().fold(0.0, |acc, group| acc + group[19]),
BatchSize::LargeInput,
)
});
}
criterion_group!(benches, small_benchmarks);
criterion_main!(benches);
Here are the results of the C# benchmark:
// * Summary *
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.17763.1039 (1809/October2018Update/Redstone5)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
[Host] : .NET Framework 4.8 (4.8.4121.0), X86 LegacyJIT
DefaultJob : .NET Framework 4.8 (4.8.4121.0), X86 LegacyJIT
| Method | Mean | Error | StdDev |
|-------------- |---------:|---------:|---------:|
| GetSumSmaller | 81.93 ns | 1.927 ns | 1.802 ns |
and the Rust function rewrites come in around 3-4 times slower:
fold like c# time: [375.82 ns 393.93 ns 424.07 ns]
change: [-7.1434% +3.0465% +14.984%] (p = 0.61 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
fold with iter time: [304.10 ns 310.30 ns 316.89 ns]
change: [-3.7387% +1.8703% +8.8750%] (p = 0.57 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
Any ideas why there's such a big difference and what I could try next?
This was built with rustc 1.40.0
for stable-x86_64-pc-windows-msvc
.