Speeding up custom `minmax` in ndarray?

Hi everybody,

I'm trying to find the min and the max or a particular combination of two matrices. Let's say I have a matrix a with shape (10000, 800) and a matrix b with shape (150, 800). I now want to broadcast each row of a onto b, then perform component-wise multiplication (the Hadamard product) and subsequently find the overal min and max value.

The idiomatic code that I came up with is:

pub fn min_max_component_wise(
    a: &ndarray::Array2<f32>,
    b: &ndarray::Array2<f32>,
) -> Option<(f32, f32)> {
    let (mut min, mut max) = (f32::MAX, f32::MIN);
    for row in a.axis_iter(ndarray::Axis(0)) {
        ndarray::Zip::from(b)
            .and_broadcast(row)
            .apply(|w, s| {
                let val = w * s;
                min = val.min(min);
                max = val.max(max);
            });
        }
    Some((min, max))
}

This, however, is quite slow. To speed it up, I resorted to directly indexing into the second array. This already made it quite a bit faster:

pub fn min_max_component_wise(
    a: &ndarray::Array2<f32>,
    b: &ndarray::Array2<f32>,
) -> Option<(f32, f32)> {
    let (mut min, mut max) = (f32::MAX, f32::MIN);
    for ((_, c), &sample) in a.indexed_iter() {
        for w in 0..b.shape()[0] {
            let val = sample * b[[w, c]];
            if val < min {
                min = val;
            }
            if val > max {
                max = val;
            }
        }
    }
    Some((min, max))
}

Do you see any other opportunities to make this run faster? I'm using the following set of of benchmarks to test this:

use criterion::{criterion_group, criterion_main, Criterion};
use ndarray::Array;
use ndarray_rand::rand_distr::Uniform;
use ndarray_rand::RandomExt;

fn custom_criterion() -> Criterion {
    Criterion::default().sample_size(20)
}

fn bench_component_wise_small(c: &mut Criterion) {
    let s = Array::random((1000, 64), Uniform::new(0., 10.));
    let w = Array::random((32, 64), Uniform::new(0., 10.));
    c.bench_function("min_max_component_wise_small", |b| {
        b.iter(|| min_max_component_wise(&s, &w))
    });
}

fn bench_component_wise_large(c: &mut Criterion) {
    let s = Array::random((1000, 784), Uniform::new(0., 10.));
    let w = Array::random((144, 784), Uniform::new(0., 10.));
    c.bench_function("min_max_component_wise_large", |b| {
        b.iter(|| min_max_component_wise(&s, &w))
    });
}

criterion_group! {
    name = benches;
    config = custom_criterion();
    targets = bench_component_wise_small, bench_component_wise_large,
}

criterion_main!(benches);

Thank you very much in advance!

2 Likes

Would it be correct to reformulate the problem like this?

For each column with same index from the two matrices (e.g. a 10000 and an 150 item list), find the maximal product of one value from each column. Then aggregate these maxes and minimums from each column?

You can perform the operation for one column with the following pseudocode:

fn min_max_column_pair(column_a, column_b) {
    let (min1, max1) = arr_min_max(column_a);
    let (min2, max2) = arr_min_max(column_b);
    return arr_min_max([
        min1 * min2,
        min1 * max2,
        max1 * min2,
        max1 * max2,
    ]);
}

With a being a u × t matrix and b being a v × t matrix, this change would improve the complexity from O(u*v*t) to O((u + v)*t).

5 Likes

Oh that’s very neat, I have totally overlooked this. Thank you so much!

I have updated my implementation to the following:

pub fn min_max_component_wise(
    a: &ndarray::Array2<f32>,
    b: &ndarray::Array2<f32>,
) -> Option<(f32, f32)> {
    let (mut min, mut max) = (f32::MAX, f32::MIN);
    let col_samples = a.axis_iter(ndarray::Axis(1));
    let col_weights = b.axis_iter(ndarray::Axis(1));
    for (cs, cw) in col_samples.zip(col_weights) {
        let (min_s, max_s) = cs.iter().minmax().into_option()?;
        let (min_w, max_w) = cw.iter().minmax().into_option()?;
        let (tmp_min, tmp_max) = [min_s * min_w, min_s * max_w, max_s * min_w, max_s * max_w]
            .iter().cloned()
            .minmax()
            .into_option()?;
        if tmp_min < min {
            min = tmp_min;
        }
        if tmp_max > max {
            max = tmp_max;
        }
    }
    if max < min {
        None
    } else {
        Some((min, max))
    }
}

It runs almost 100% faster than the previous version on my relevant test cases. I'll now look into squeezing more performance out of this by indexing directly into the columns.

1 Like