# Is it possible to improve ndarray performance vs nalgebra?

Hi!

I had a simple projection algorithm written using `ndarray` and `nalgebra`, and noticed the latter performs substantially faster. I then tested that even for simple indexing operations it is much faster. Is there any trick I'm missing? Or otherwise is it simply not recommended to use `ndarray` for low-dimensional cases? What is the main source of this difference in performance? I thought it might be that `ndarray` has to perform one extra check to validate the indexing, but even then at worst it should be 2x the runtime of `nalgebra`, not over 5x... I also tried rewriting the code to e.g. `*array.uget_mut((0,0)) += 1.0` and adding an unsafe wrapper, but even then `ndarray` is over 2x slower.

``````use nalgebra::Point2; // 0.32.2
use std::time::Instant;
use std::hint::black_box;
use ndarray::prelude::*;

fn index_vector(vector: &mut Vec<Point2<f64>>) -> (){
for _ in 0..10000 {
vector.x += 1.0;
vector.y += 1.0;
vector.x += 0.5;
vector.y += 0.5;
vector.x -= 1.0;
vector.y -= 1.0;
vector.x -= 0.5;
vector.y -= 0.5;
vector.x += 1.0;
vector.y += 1.0;
vector.x -= 0.5;
vector.y -= 0.5;
}
}

fn index_array(array: &mut Array2<f64>) -> () {
for _ in 0..10000 {
array[[0,0]] += 1.0;
array[[0,1]] += 1.0;
array[[0,0]] += 0.5;
array[[0,1]] += 0.5;
array[[0,0]] -= 1.0;
array[[0,1]] -= 1.0;
array[[0,0]] -= 0.5;
array[[0,1]] -= 0.5;
array[[0,0]] += 1.0;
array[[0,1]] += 1.0;
array[[0,0]] -= 0.5;
array[[0,1]] -= 0.5;
}
}

fn main() {
let mut v = vec![Point2::new(0.0, 0.0)];
let start = Instant::now();
for _ in 0..10 {
let _ = black_box(index_vector(&mut v));
}
let elapsed = start.elapsed();
println!("Elapsed time: {:?}", elapsed);

let mut v = array![[0.0, 0.0]];
let start = Instant::now();
for _ in 0..10 {
let _ = black_box(index_array(&mut v));
}
let elapsed = start.elapsed();
println!("Elapsed time: {:?}", elapsed);
}
``````

Output:

``````Elapsed time: 28.677584ms
Elapsed time: 155.103036ms

``````

You are running in debug mode, without optimization. Comparing performance of non-optimized builds is meaningless. If you run it in release mode, with optimizations turned on, then there's no substantial difference in the two runtimes, both logged time deltas are on the order of ~550µs.

3 Likes

Oh awesome, thanks!

I think the example I gave was too simplistic, and the original question was unfortunately clouded by my misuse of the Debug/Release mode for timing purposes.

Here's a slightly more complex example function which finds the longest edge in a list of consecutive points, and returns both the edge and its length:

``````use nalgebra::Vector2; // 0.32.2
use std::time::Instant;
use std::hint::black_box;
use ndarray::prelude::*;

fn longest_edge_nalgebra(vector: &mut Vec<Vector2<f64>>) -> (f64, Vector2<f64>) {
let begin_idx = 0;
let end_idx = vector.len()-2;
let mut longest_edge = vector.clone();
let mut longest_length = longest_edge.norm();
for i in begin_idx..=end_idx {
let edge = vector[i+1] - vector[i];
let edge_length = edge.norm();
if edge_length > longest_length {
longest_length = edge_length;
longest_edge = edge;
}
}
(longest_length, longest_edge)
}

fn longest_edge_ndarray(array: &mut Array2<f64>) -> (f64, Array1<f64>) {
let begin_idx = 0;
let end_idx = array.nrows()-2;
let mut longest_edge = array.row(0).to_owned();
let mut longest_length = longest_edge.dot(&longest_edge).sqrt();
for i in begin_idx..=end_idx {
let edge = &array.row(i+1) - &array.row(i);
let edge_length = edge.dot(&edge).sqrt();
if edge_length > longest_length {
longest_length = edge_length;
longest_edge = edge
}
}
(longest_length, longest_edge)
}

fn main() {
let mut v = Array::from_shape_fn((5000,2), |(i,j)| if j == 0 {i as f64} else { 0. });
let start = Instant::now();
for _ in 0..10 {
let _ = black_box(longest_edge_ndarray(&mut v));
}
let elapsed = start.elapsed();
println!("Elapsed time ndarray: {:?}", elapsed);

let mut v: Vec<Vector2<f64>> = vec![];
for n in 0..5000 {
v.push(Vector2::new(n as f64, 0.0));
}
let start = Instant::now();
for _ in 0..10 {
let _ = black_box(longest_edge_nalgebra(&mut v));
}
let elapsed = start.elapsed();
println!("Elapsed time nalgebra: {:?}", elapsed);
}
``````

Output in release mode:

``````Elapsed time ndarray: 1.207955ms
Elapsed time nalgebra: 291.768µs
``````

There is over a 4x difference. For even more complex algorithms such as projecting points, this further increases in my implementations. So my original question still stands (unless I'm still messing up the timing), what is the reason for this difference? Am I using the `ndarray` crate wrong somehow, can its performance be improved?

One thing I noticed is that if I do not return the longest edge itself, but just the longest length, the two are comparable in performance (Playground). Based on this, changing just the initial part of the original function to `let mut longest_edge = array![array[[0,0]], array[[0,1]]];` leads to ~750µs... (Playground). I would hope the more readable functions can still be used to get at least such an improvement. Changing how the longest edge yields faster results than nalgebra (Playground), but again I would hope this is obtainable with the library functions, I guess using knowledge of the array sizes matters here and that's the fundamental source of the difference.

`ndarray`'s internal presentation is much more complex since it allows slicing/striding. Additionally `nalgebra` uses some constant generic inside, which further reduce allocation and bound checking.

In general, indexing `ndarray::Array` is much more costly. But If you find a way to use `ndarray::Zip` instead of indexing directly, it won't have too much a difference.

Here's a version of `longest_edge_ndarray` that's within 10% of the nalgebra one.

``````fn longest_edge_ndarray(array: &Array2<f64>) -> (f64, Array1<f64>) {
let mut longest_length = array.row(0).dot(&array.row(0)).sqrt();
let mut longest_edge_i = None;

array
.axis_windows(Axis(0), 2)
.into_iter()
.enumerate()
.for_each(|(i, array)| {
let edge_length = ndarray::Zip::from(array.row(1))
.and(array.row(0))
.fold(0., |sum, a, b| {
let diff = a - b;
sum + diff * diff
})
.sqrt();

if edge_length > longest_length {
longest_length = edge_length;
longest_edge_i = Some(i);
}
});

let longest_edge = if let Some(i) = longest_edge_i {
&array.row(i + 1) - &array.row(i)
} else {
array.row(0).to_owned()
};

(longest_length, longest_edge)
}
``````

One thing `nalgebra` do is that, a `Vector2<T>` is stored as `[T; 2]` because the length is known at compile time, but in `ndarray`, a `Array<T>` is similar to a `Vec<T>` that has length 2, which cause allocation. In your version of `longest_edge_ndarray`, `let edge = &array.row(i+1) - &array.row(i);` unfortunately creates an `Array<f64>` which allocates. So skipping that will gain lot of time.

1 Like

Yeah I guess it's about knowing the dimension along various axis. The version I wrote which allocates and saves the longest edge in every iteration but explicitly indexes for  and  still seems to be the fastest (Playground):

``````Elapsed time ndarray: 226.857µs
Elapsed time ndarray allocating: 165.475µs
Elapsed time nalgebra: 230.937µs
``````

Then again, I'm not sure about measuring time in playground because just by reusing the input array `v` and commenting out the line of code I marked in the playground code yields:

``````Elapsed time ndarray: 216.636µs
Elapsed time ndarray allocating: 1.817783ms
Elapsed time nalgebra: 226.927µs
``````

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.