How to calculate the median of &[i64] as fast as possible?

hey gays i want to calculate the median of array.
the array seems like this

let array = vec![-50000, -50000, -50000, -100000, -100000, -50000, -100000, -100000, -50000, -50000, 0, 0, -50000, -50000, -100000, -50000, -50000, -100000, -50000, -50000, -50000, -50000, -100000, -50000, -50000, -50000, -100000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -50000, 0, -100000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -50000, -100000, -50000, -50000, -100000, -150000, -50000, 0, -100000, -150000, -50000, 0, -100000, -50000, -50000, -100000, 0, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -100000, -50000, -100000, -50000, -50000, -50000, -100000, -100000, -100000, -50000, -100000, -100000, -50000, -100000, -100000, -100000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -100000, -100000, -50000, -100000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -50000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, 0, 0, -150000, -150000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -100000, -50000, -50000, -50000, -50000, -100000, -50000, -150000, -150000, -100000, -50000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -50000, -100000, -100000, -50000, -50000, -100000, -50000, -150000, -150000, -50000, -50000, -100000, -100000, -100000, -50000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -100000, -100000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -100000, -100000, -50000, -100000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -100000, -100000, -50000, -50000, -50000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -50000, -100000, -100000, -100000, -50000, -100000, -50000, -50000, 0, -100000, -100000, -100000, -100000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -100000, -50000, -50000, 0, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -100000, -100000, -100000, -50000, -50000, -50000, -100000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -50000, -100000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -50000, -50000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -150000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000, -100000]

and the length of array will always be 600.

welcome to discuss with me.

ps: can we use the simd ?

1 Like

I guess these are i32?

I would count the occurrences of -50_000 and compare it to 300.

You can compare several numbers with the SIMD core::arch::x86_64::_mm256_cmpeq_epi32, but then you have to count the number of 1 in the result of this operation (e.g. summing the numbers of the vector into a u32.

Edit: Oops, I did not see the 0 and -15_000 in the vector. I thought -50_000 and -100_000 were the only possible values. Are [0, -15_000, -50_000, -100_000] the only possible values? Then you can also count the number of 0 and -15_000. But then maybe sorting is faster...

1 Like

There does exist a linear time median finding algorithm (just search this term and you'll find what you need).
However, I don't know if it is faster or slower than simple counting sort of a 600 element array.

maybe -20000 is possible.

If you have a very small set of possible values, use counting sort - it'll be linear time and also very fast since all you're doing is scanning the array from left to right.
Then, if you have an array of counts, such as -20000 = 10, 0 = 100, 5000 = 200, ..., then you can find the median in a single scan of this array, scanning at most half of the elements.
Now, whether SIMD can be used in this or not, I cannot tell.

3 Likes

got it

1 Like

One could also just browse the slice docs: https://doc.rust-lang.org/std/primitive.slice.html#method.select_nth_unstable

5 Likes

it works for me.

every tick arrived i will update the table.

Given there’s an even number of elements, depending on how you want to define the median, you may want to use Iterator::max on the lower half slice return by select_nth_unstable and average that with the nth value that it returns.

Because it's on integers, I'd generally just say "I'm willing to have a 1⁄n error in the percentile of the item I'm returning", because trying to return the "and a half" part sometimes is more trouble than it's worth.

Especially since good algorithms for doing this at scale (like t-Digests) also give you error bounds like "if you ask for the 50th percentile item, we'll give you an item somewhere between the 49.9 and 50.1 percentile".

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.