How to get the max val in array as fast as possible?

i got a float array which not contain nan or inf. and i want to get the max element.

#[inline(always)]
pub fn max_f64_array(arr: &[f64]) -> f64 {
    unsafe {
        *(arr
            .iter()
            .max_by(|a, b| a.partial_cmp(b).unwrap_unchecked())
            .unwrap_unchecked())
    }
}

and the example array.

    let array = [1.0, 4.0, 9.0, 4.32, 65.0, 95.0, 3.3, 4.56, 2.74, 9.3, 8.41]; 
    let max_element = max_f64_array(&array); 

Does any faster algorithm exist here ?

Short answer: no.

Long answer: Simple iterators and equivalent simple loops like this can usually be (and are) auto-vectorized by the compiler. Thus, this is generally as fast as you get algorithmically (general min/max reduction can't be asymptotically faster on an arbitrary, unordered array, because you must always look at every element.)

However, the unsafety is entirely unwarranted here. For one, your code is unsound because you still unwrap_unchecked() the min/max of an empty slice, which is always None, triggering UB.

If you know that your numbers are never NaN, then consider encoding this invariant in the type system, using a floating-point wrapper type that implements Ord, removing the inner unwrap, and remove the outer unwrap or at least convert it to the non-unsafe method instead.

If you find that you are calling this in a hot loop, then you are probably missing out on utilizing an appropriate data structure (eg. a heap, a B-tree, or a simple sorted array) for which min/max retrieval is trivial or at least algorithmically faster than naïve linear search.

7 Likes

^ This! I was going to write something with the same conclusion:

If you only search for the maximum value in a slice once then creating that data took long enough that minor differences between the speed of the (already plenty fast) straightforward (and safe) implementation vs. potential slightly improved implementations are negligible. If you search for the maximum value in a slice many times, (presumably with small modifications to the slice each time), then there will most likely be better data structures to choose, with asymptotic improvements, instead of just constant factors.

2 Likes

data only live short time, build another datastructure is expensive for me.

I don't think you can get it any faster without SIMD. Unfortunately, I don't think the optimizer is smart enough to autovectorize this, so you'd have to implement SIMD manually.

2 Likes

It is for integers https://rust.godbolt.org/z/anr3PM8b7

  %12 = tail call <4 x i32> @llvm.smax.v4i32(<4 x i32> %vec.phi, <4 x i32> %wide.load), !dbg !94
  %13 = tail call <4 x i32> @llvm.smax.v4i32(<4 x i32> %vec.phi1, <4 x i32> %wide.load3), !dbg !94

So this is likely just the usual problem that SIMD-optimizing floating point is hard because it doesn't have the nice properties that integers do, like associativity.

(For instance, I don't remember how floating-point max behaves between differently-signed zeros.)

1 Like