I was just messing around to see which is optimized more and saw the following results.
Overall
C++ with O3 6037338
C++ with Os 11049853
Rust with alias 11361628
Rust with noalias 11535222
Average
C++ with O3 6037.338
C++ with Os 11049.853
Rust with alias 11361.628
Rust with noalias 11535.222
Standard deviation
C++ with O3 7285.095556803356
C++ with Os 6220.168290439657
Rust with alias 14224.509900436498
Rust with noalias 7729.57509380147
Rust code
use std::{io, time::Instant};
fn main() {
let mut s = String::new();
io::stdin().read_line(&mut s).expect("Parsing error");
let inp = s.trim().parse::<i32>().unwrap();
let mut a: [i32; 100_000] = [0; 100_000];
let mut b: [i32; 100_000] = [0; 100_000];
let mut c: [i32; 100_000] = [0; 100_000];
let n = Instant::now();
for i in 0..inp {
a[i as usize] = i;
}
for i in 0..inp {
b[i as usize] = i;
}
for i in 0..inp {
c[i as usize] = a[i as usize] + b[i as usize];
}
let now = n.elapsed();
println!("{:?} {}", now.as_nanos(), c[inp as usize- 1]);
}
Is there some reason why the alias version is having the same or similar level of speed as non-alias version? Plus C++ being much faster without any aliasing disabled? The program is a 100000 integer array and I used the same code in Rust and C++ too, so at least in Rust no alias should be somewhat faster?
inp may be bigger than 100_000, meaning those array accesses have to be bound checked. Bound checks prevent auto vectorization, which may significantly speed up things. Auto vectorization may also be disabled in -Os, so that may explain why C++ with -Os performs like Rust.
Usually performance issues like this can be solved in 2 ways:
By moving the assertion before the loop and relying on LLVM to remove the bound check from the loop body. This approach however is not always the best and often can fail to trigger the optimization, so I wouldn't rely on it. In fact I would expect LLVM to be able to optimize the bound checks in this code, but it doesn't.
let inp = inp as usize;
assert!(inp <= 100_000);
for i in 0..inp {
a[i] = i as i32;
}
for i in 0..inp {
b[i] = i as i32;
}
for i in 0..inp {
c[i] = a[i] + b[i];
}
By using iterators. This is the most idiomatic way and often they optimize better than plain indexing. Here I use for-loops because it doesn't make a difference, but in general internal iteration (e.g. for_each and fold) can often be optimized even better.
let inp = inp as usize;
for (i, ai) in a[0..inp].iter_mut().enumerate() {
*ai = i as i32;
}
for (i, bi) in b[0..inp].iter_mut().enumerate() {
*bi = i as i32;
}
for ((ci, ai), bi) in c[0..inp].iter_mut().zip(&a).zip(&b) {
*ci = *ai + *bi;
}
This code with the same benchmarks got posted to /r/cpp before it was removed if you want to have a compare. The reserves where recommended to speed it up.
#include #include #include
#include<unistd.h>
using namespace std;
int main() {
vector a, b, c;
cin >> n
//int n = 1000000000; //a.reserve(n); //b.reserve(n); //c.reserve(n);
auto start = chrono::steady_clock::now();
for(int i=0;i<n;i++) {
a.push_back(i);
}
for(int i=0;i<n;i++) {
b.push_back(i);
}
for(int i=0;i<n;i++) {
c.push_back(a[i]+b[i]);
}
auto end = chrono::steady_clock::now();
cout << chrono::duration_cast<chrono::milliseconds>(end-start).count()
<<" ";
cout << c[n-1] << "\n";
return 0;
}
This was a rust version of the code someone said should be fast and is pretty standard code.
use std::time::{Duration, Instant};
fn main() {
let now = Instant::now();
let n = 1000000000;
let a: Vec<_> = (0..n).collect();
let b: Vec<_> = (0..n).collect();
let c: Vec<_> = a.iter().zip(b.iter()).map(|(a, b)| *a + *b).collect();
println!("{}", now.elapsed().as_millis());
println!("{:?}",c.len());
}
On my machines the rust code is 2.2 seconds and the cpp code ( using -O2) takes 6.7 seconds where both are hardcoded with an input of 1000000000 and the cpp one reserves the vectors.
I also tried switching on noalias with this but its slower.
RUSTFLAGS="-Z mutable-noalias=yes" cargo +nightly run --release