How to calculate the median of &[i64] as fast as possible?

hey gays i want to calculate the median of array.
the array seems like this

let array = vec![-50000, -50000, -50000, -100000, -100000, -50000, -100000, -100000, -50000, -50000, 0, 0, -50000, -50000, -100000, -50000, -50000, -100000, -50000, -50000, -50000, -50000, -100000, -50000, -50000, -50000, -100000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -50000, 0, -100000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -50000, -100000, -50000, -50000, -100000, -150000, -50000, 0, -100000, -150000, -50000, 0, -100000, -50000, -50000, -100000, 0, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -100000, -50000, -100000, -50000, -50000, -50000, -100000, -100000, -100000, -50000, -100000, -100000, -50000, -100000, -100000, -100000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -100000, -100000, -50000, -100000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -50000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, 0, 0, -150000, -150000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -100000, -50000, -50000, -50000, -50000, -100000, -50000, -150000, -150000, -100000, -50000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -50000, -100000, -100000, -50000, -50000, -100000, -50000, -150000, -150000, -50000, -50000, -100000, -100000, -100000, -50000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -100000, -100000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -100000, -100000, -50000, -100000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -100000, -100000, -50000, -50000, -50000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -50000, -100000, -100000, -100000, -50000, -100000, -50000, -50000, 0, -100000, -100000, -100000, -100000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -50000, 0, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -100000, -100000, -50000, -50000, -50000, -100000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000]

and the length of array will always be 600.

welcome to discuss with me.

ps: can we use the simd ?

I guess these are i32?

I would count the occurrences of -50_000 and compare it to 300.

You can compare several numbers with the SIMD core::arch::x86_64::_mm256_cmpeq_epi32, but then you have to count the number of 1 in the result of this operation (e.g. summing the numbers of the vector into a u32.

Edit: Oops, I did not see the 0 and -15_000 in the vector. I thought -50_000 and -100_000 were the only possible values. Are [0, -15_000, -50_000, -100_000] the only possible values? Then you can also count the number of 0 and -15_000. But then maybe sorting is faster...

There does exist a linear time median finding algorithm (just search this term and you'll find what you need).
However, I don't know if it is faster or slower than simple counting sort of a 600 element array.

maybe -20000 is possible.

If you have a very small set of possible values, use counting sort - it'll be linear time and also very fast since all you're doing is scanning the array from left to right.
Then, if you have an array of counts, such as -20000 = 10, 0 = 100, 5000 = 200, ..., then you can find the median in a single scan of this array, scanning at most half of the elements.
Now, whether SIMD can be used in this or not, I cannot tell.

2 Likes

got it

One could also just browse the slice docs: https://doc.rust-lang.org/std/primitive.slice.html#method.select_nth_unstable

5 Likes

it works for me.

every tick arrived i will update the table.

Given there’s an even number of elements, depending on how you want to define the median, you may want to use Iterator::max on the lower half slice return by select_nth_unstable and average that with the nth value that it returns.

Because it's on integers, I'd generally just say "I'm willing to have a 1⁄n error in the percentile of the item I'm returning", because trying to return the "and a half" part sometimes is more trouble than it's worth.

Especially since good algorithms for doing this at scale (like t-Digests) also give you error bounds like "if you ask for the 50th percentile item, we'll give you an item somewhere between the 49.9 and 50.1 percentile".