Is it possible to improve ndarray performance vs nalgebra?

Tusike · May 19, 2023, 10:43am

Hi!

I had a simple projection algorithm written using ndarray and nalgebra, and noticed the latter performs substantially faster. I then tested that even for simple indexing operations it is much faster. Is there any trick I'm missing? Or otherwise is it simply not recommended to use ndarray for low-dimensional cases? What is the main source of this difference in performance? I thought it might be that ndarray has to perform one extra check to validate the indexing, but even then at worst it should be 2x the runtime of nalgebra, not over 5x... I also tried rewriting the code to e.g. *array.uget_mut((0,0)) += 1.0 and adding an unsafe wrapper, but even then ndarray is over 2x slower.

use nalgebra::Point2; // 0.32.2
use std::time::Instant;
use std::hint::black_box;
use ndarray::prelude::*;

fn index_vector(vector: &mut Vec<Point2<f64>>) -> (){
    for _ in 0..10000 {
        vector[0].x += 1.0;
        vector[0].y += 1.0;
        vector[0].x += 0.5;
        vector[0].y += 0.5;
        vector[0].x -= 1.0;
        vector[0].y -= 1.0;
        vector[0].x -= 0.5;
        vector[0].y -= 0.5;
        vector[0].x += 1.0;
        vector[0].y += 1.0;
        vector[0].x -= 0.5;
        vector[0].y -= 0.5;
    }
}

fn index_array(array: &mut Array2<f64>) -> () {
    for _ in 0..10000 {
        array[[0,0]] += 1.0;
        array[[0,1]] += 1.0;
        array[[0,0]] += 0.5;
        array[[0,1]] += 0.5;
        array[[0,0]] -= 1.0;
        array[[0,1]] -= 1.0;
        array[[0,0]] -= 0.5;
        array[[0,1]] -= 0.5;
        array[[0,0]] += 1.0;
        array[[0,1]] += 1.0;
        array[[0,0]] -= 0.5;
        array[[0,1]] -= 0.5;
    }
}


fn main() {
    let mut v = vec![Point2::new(0.0, 0.0)];
    let start = Instant::now();
    for _ in 0..10 {
        let _ = black_box(index_vector(&mut v));
    }
    let elapsed = start.elapsed();
    println!("Elapsed time: {:?}", elapsed);
    
    let mut v = array![[0.0, 0.0]];
    let start = Instant::now();
    for _ in 0..10 {
        let _ = black_box(index_array(&mut v));
    }
    let elapsed = start.elapsed();
    println!("Elapsed time: {:?}", elapsed);
}

(Playground)

Output:

Elapsed time: 28.677584ms
Elapsed time: 155.103036ms

H2CO3 · May 19, 2023, 11:04am

You are running in debug mode, without optimization. Comparing performance of non-optimized builds is meaningless. If you run it in release mode, with optimizations turned on, then there's no substantial difference in the two runtimes, both logged time deltas are on the order of ~550µs.

Tusike · May 19, 2023, 11:14am

Oh awesome, thanks!

Tusike · May 25, 2023, 1:46am

I think the example I gave was too simplistic, and the original question was unfortunately clouded by my misuse of the Debug/Release mode for timing purposes.

Here's a slightly more complex example function which finds the longest edge in a list of consecutive points, and returns both the edge and its length:

use nalgebra::Vector2; // 0.32.2
use std::time::Instant;
use std::hint::black_box;
use ndarray::prelude::*;

fn longest_edge_nalgebra(vector: &mut Vec<Vector2<f64>>) -> (f64, Vector2<f64>) {
    let begin_idx = 0;
    let end_idx = vector.len()-2;
    let mut longest_edge = vector[0].clone();
    let mut longest_length = longest_edge.norm();
    for i in begin_idx..=end_idx {
        let edge = vector[i+1] - vector[i];
        let edge_length = edge.norm();
        if edge_length > longest_length {
            longest_length = edge_length;
            longest_edge = edge;
        }
    }
    (longest_length, longest_edge)
}

fn longest_edge_ndarray(array: &mut Array2<f64>) -> (f64, Array1<f64>) {
    let begin_idx = 0;
    let end_idx = array.nrows()-2;
    let mut longest_edge = array.row(0).to_owned();
    let mut longest_length = longest_edge.dot(&longest_edge).sqrt();
    for i in begin_idx..=end_idx {
        let edge = &array.row(i+1) - &array.row(i);
        let edge_length = edge.dot(&edge).sqrt();
        if edge_length > longest_length {
            longest_length = edge_length;
            longest_edge = edge
        }
    }
    (longest_length, longest_edge)
}


fn main() {
    let mut v = Array::from_shape_fn((5000,2), |(i,j)| if j == 0 {i as f64} else { 0. });
    let start = Instant::now();
    for _ in 0..10 {
        let _ = black_box(longest_edge_ndarray(&mut v));
    }
    let elapsed = start.elapsed();
    println!("Elapsed time ndarray: {:?}", elapsed);
    
    let mut v: Vec<Vector2<f64>> = vec![];
    for n in 0..5000 {
        v.push(Vector2::new(n as f64, 0.0));
    }
    let start = Instant::now();
    for _ in 0..10 {
        let _ = black_box(longest_edge_nalgebra(&mut v));
    }
    let elapsed = start.elapsed();
    println!("Elapsed time nalgebra: {:?}", elapsed);
}

Output in release mode:

Elapsed time ndarray: 1.207955ms
Elapsed time nalgebra: 291.768µs

(Playground)

There is over a 4x difference. For even more complex algorithms such as projecting points, this further increases in my implementations. So my original question still stands (unless I'm still messing up the timing), what is the reason for this difference? Am I using the ndarray crate wrong somehow, can its performance be improved?

One thing I noticed is that if I do not return the longest edge itself, but just the longest length, the two are comparable in performance (Playground). Based on this, changing just the initial part of the original function to let mut longest_edge = array![array[[0,0]], array[[0,1]]]; leads to ~750µs... (Playground). I would hope the more readable functions can still be used to get at least such an improvement. Changing how the longest edge yields faster results than nalgebra (Playground), but again I would hope this is obtainable with the library functions, I guess using knowledge of the array sizes matters here and that's the fundamental source of the difference.

zirconium-n · May 25, 2023, 4:47am

ndarray's internal presentation is much more complex since it allows slicing/striding. Additionally nalgebra uses some constant generic inside, which further reduce allocation and bound checking.

In general, indexing ndarray::Array is much more costly. But If you find a way to use ndarray::Zip instead of indexing directly, it won't have too much a difference.

Here's a version of longest_edge_ndarray that's within 10% of the nalgebra one.

fn longest_edge_ndarray(array: &Array2<f64>) -> (f64, Array1<f64>) {
    let mut longest_length = array.row(0).dot(&array.row(0)).sqrt();
    let mut longest_edge_i = None;

    array
        .axis_windows(Axis(0), 2)
        .into_iter()
        .enumerate()
        .for_each(|(i, array)| {
            let edge_length = ndarray::Zip::from(array.row(1))
                .and(array.row(0))
                .fold(0., |sum, a, b| {
                    let diff = a - b;
                    sum + diff * diff
                })
                .sqrt();

            if edge_length > longest_length {
                longest_length = edge_length;
                longest_edge_i = Some(i);
            }
        });

    let longest_edge = if let Some(i) = longest_edge_i {
        &array.row(i + 1) - &array.row(i) 
    } else {
        array.row(0).to_owned()
    };

    (longest_length, longest_edge)
}

One thing nalgebra do is that, a Vector2<T> is stored as [T; 2] because the length is known at compile time, but in ndarray, a Array<T> is similar to a Vec<T> that has length 2, which cause allocation. In your version of longest_edge_ndarray, let edge = &array.row(i+1) - &array.row(i); unfortunately creates an Array<f64> which allocates. So skipping that will gain lot of time.

Tusike · May 25, 2023, 12:06pm

Yeah I guess it's about knowing the dimension along various axis. The version I wrote which allocates and saves the longest edge in every iteration but explicitly indexes for [0] and [1] still seems to be the fastest (Playground):

Elapsed time ndarray: 226.857µs
Elapsed time ndarray allocating: 165.475µs
Elapsed time nalgebra: 230.937µs

Then again, I'm not sure about measuring time in playground because just by reusing the input array v and commenting out the line of code I marked in the playground code yields:

Elapsed time ndarray: 216.636µs
Elapsed time ndarray allocating: 1.817783ms
Elapsed time nalgebra: 226.927µs

Topic		Replies	Views
Ndarray vs nalgebra - which is best? help	5	7243	May 4, 2023
Ndarray calculation is not fast, why? help	6	518	December 23, 2023
Is ndarray access efficient?	1	741	January 12, 2023
Best way to create Matrix of Matrices help	34	1987	November 30, 2020
What is the best way to work with matrices? help	5	6017	August 22, 2020

Is it possible to improve ndarray performance vs nalgebra?

Related topics