I have benchmark, sort of, coming from university problems (I'm no programmer, I do computational physics). The code solves equations for matrices with special structures where the structure is hard coded into functions to save memory. The code was ported from Matlab to Julia to C++ to Rust by hand so it is not very idiomatic anyway.
The benchmarks for Rust are rather disappointing probably because Rust cannot optimize away the bounds checks and therefore the the loops cannot aggressively be optimized for SIMD instructions:
So there is almost a factor of 3 to C++ (the comparison is unfair because C++ was also compiled with the fast math options but even then there is still a factor of 2). I also use target-cpu=native
for Rust but it did not change the runtimes.
I am aware that there are unsafe functions that do unchecked access on a field but it would be better for readability if I could still use the []
operator. In C++ the []
operator does not do bounds checks and in Julia bounds checks can be turned off globally or whole functions after the code has been developed and tested.
A full excerpt of the code is to follow. The code consists of for
loops over ranges of the following form:
for i in l.n-l.n_x+2..=l.n-1 {
r2 += square(-b[i-1] + x[i-1]*l.diag as f64 + x[i+1-1]*l.tri_diag as f64 + x[i-1-1]*l.tri_diag as f64 + x[i-l.n_x-1]*l.side_diag as f64);
}
By construction the code is always inbounds but the compiler does not seem to be aware of that. My question is how one would formulate this code in a safe way. I am aware of .iter().enumerate()
to iterate over elements as well as indices but it is not clear to me how this might be used for a more complex example.
The standard benchmark of Rust in Nightly benchmarks this function with 283 ns/iter (+/- 51)
. In Julia it is 108 ns
and in C++ 151 ns
although this comparison needs to be taken with a grain of salt since it was done with three different benchmark libraries but it reflects the general result from above.
pub fn square<T>(i: T) -> T
where
T:std::ops::Mul + std::ops::Mul<Output = T> + Copy {
i*i
}
#[derive(Debug)]
pub struct Laplace2dMatrix {
pub n_x : usize ,
pub n_y : usize ,
pub n : usize ,
pub diag : i64 ,
pub tri_diag : i64 ,
pub side_diag : i64 ,
}
impl Laplace2dMatrix {
pub fn rectangular(n_x: usize, n_y: usize) -> Laplace2dMatrix {
Laplace2dMatrix {
n_x: n_x ,
n_y: n_y ,
n: n_x*n_y ,
diag: -2*( square(n_x+1) + square(n_y+1) ) as i64 ,
tri_diag: square(n_x+1) as i64 ,
side_diag: square(n_y+1) as i64 ,
}
}
pub fn quadratic(n_xy: usize) -> Laplace2dMatrix {
Laplace2dMatrix::rectangular(n_xy, n_xy)
}
}
/// calculates the residual r^2 = ||l*x - b||_2^2
pub fn calculate_residual_squared(l: &Laplace2dMatrix, x: &[f64], b: &[f64]) -> f64 {
// Assumptions that must hold for l
// Laplace2dMatrix represents a block diagonal matrix where the block matrix
// is the same for all blocks and the block matrix is a band matrix
// n_x and n_y are the size of the block matrix and l is a n x n matrix
// where n = n_x*n_y
assert!(l.n == x.len());
assert!(l.n == b.len());
assert!(l.n == l.n_x*l.n_y);
let mut r2 = 0.0;
r2 += square(-b[1-1] + x[1-1]*l.diag as f64 + x[2-1] * l.tri_diag as f64 + x[1+l.n_x-1] * l.side_diag as f64);
for i in 2..=l.n_x-1 {
r2 += square(-b[i-1] + x[i-1]*l.diag as f64 + x[i+1-1]*l.tri_diag as f64 + x[i-1-1]*l.tri_diag as f64 + x[i+l.n_x-1]*l.side_diag as f64);
}
r2 += square(-b[l.n_x-1] + x[l.n_x-1]*l.diag as f64 + x[l.n_x-1-1]*l.tri_diag as f64 + x[l.n_x+l.n_x-1]*l.side_diag as f64);
for outer_base in (l.n_x..=l.n_x*(l.n_y-2)).step_by(l.n_x) {
r2 += square(-b[outer_base+1 -1] + x[outer_base+1 -1]*l.diag as f64 + x[outer_base+2 -1]*l.tri_diag as f64 + x[outer_base+1-l.n_x -1]*l.side_diag as f64 + x[outer_base+1+l.n_x-1]*l.side_diag as f64);
for i in 2..=l.n_x-1 {
r2 += square(-b[outer_base+i-1] + x[outer_base+i-1]*l.diag as f64 + x[outer_base+i+1-1]*l.tri_diag as f64 + x[outer_base+i-1-1]*l.tri_diag as f64 + x[outer_base+i+l.n_x-1]*l.side_diag as f64 + x[outer_base+i-l.n_x-1]*l.side_diag as f64);
}
r2 += square(-b[outer_base+l.n_x-1] + x[outer_base+l.n_x-1]*l.diag as f64 + x[outer_base+l.n_x-1-1]*l.tri_diag as f64 + x[outer_base+l.n_x+l.n_x-1]*l.side_diag as f64 + x[outer_base -1]*l.side_diag as f64);
}
r2 += square(-b[l.n-l.n_x+1-1] + x[l.n-l.n_x+1-1]*l.diag as f64 + x[l.n-l.n_x+2-1]*l.tri_diag as f64 + x[l.n-l.n_x-l.n_x+1-1]*l.side_diag as f64);
for i in l.n-l.n_x+2..=l.n-1 {
r2 += square(-b[i-1] + x[i-1]*l.diag as f64 + x[i+1-1]*l.tri_diag as f64 + x[i-1-1]*l.tri_diag as f64 + x[i-l.n_x-1]*l.side_diag as f64);
}
r2 += square(-b[l.n -1] + x[l.n -1]*l.diag as f64 + x[l.n-1 -1]*l.tri_diag as f64 + x[l.n-l.n_x -1]*l.side_diag as f64);
r2 /= l.n as f64;
return r2;
}
fn main() {
let l = Laplace2dMatrix::quadratic(10);
let x = vec![1.2; 100];
let b = vec![1.2; 100];
println!("squared residual: {:?}", calculate_residual_squared(&l, &x, &b));
}