Implementation of hybrid partial sorting algorithm

rust-guru · July 17, 2023, 8:06am

// OPTIMIZE THIS FUNCTION TO RUN AS FAST AS POSSIBLE
// Result must work on play.rust-lang.org

/// Returns the 8 smallest numbers found in the supplied vector
/// in order smallest to largest.
fn least_8(l: &Vec<u32>) -> Vec<u32> {
    let mut ll = l.clone();
    ll.sort();
    ll[0..8].iter().cloned().collect()
}

// DO NOT CHANGE ANYTHING BELOW THIS LINE

fn make_list() -> Vec<u32> {
    const SIZE: usize = 1<<16;
    let mut out = Vec::with_capacity(SIZE);
    let mut num = 998_244_353_u32; // prime
    for i in 0..SIZE {
        out.push(num);
        // rotate and add to produce some pseudorandomness
        num = (((num << 1) | (num >> 31)) as u64 + (i as u64)) as u32;
    }
    return out;
}

fn main() {
    let l = make_list();
    let start = std::time::Instant::now();
    let l8 = least_8(&l);
    let end = std::time::Instant::now();
    assert_eq!(vec![4, 5, 15, 22, 28, 31, 37, 38], l8);
    println!("Took {:?}", end.duration_since(start));
}

(Playground)

H2CO3 · July 17, 2023, 8:25am

You don't need to allocate a vector if you always want 8 elements. You don't need to sort the whole thing either; that's O(n log n), whereas just naïvely finding the top k elements by looping is O(n * k), which can be faster if k is sufficiently small compared to log n. Furthermore, the constant factors in linear search are much smaller compared to sorting, because sorting does a lot of random access, whereas the linear scanning pattern of linear search can easily be predicted/pre-fetched by the CPU/MMU.

All in all, a possible improvement would be

fn least_8(l: &[u32]) -> [u32; 8] {
    let mut arr = [0; 8];
    arr[0] = l.iter().copied().min().unwrap();

    for i in 1..arr.len() {
        arr[i] = l.iter().copied().filter(|&x| x > arr[i - 1]).min().unwrap();
    }

    arr
}

which is 400µs vs. 3ms in release mode, or around 8 times faster.

rust-guru · July 17, 2023, 8:40am

Can you check it?
This is my own playground.
Execution time 200us

hax10 · July 17, 2023, 9:50am

Building on the repeated linear scans from @H2CO3, I've modified the sorting function to do everything in one pass. The idea is to maintain an always-sorted list of the 8 smallest numbers we've seen so far, and insert into it only when we see a number small enough to replace an element in this list of 8.

Execution time of the algorithm is around 40µs on average, which is about 10 times faster than the repeated scans and 75 times faster than the original.

fn least_8(l: &[u32]) -> [u32; 8] {
    // Initialize 8-element return value with first
    // 8 elements from the input vector, and then
    // sort this subset to keep the return vec sorted
    // at all times
    // (assume input vector length >= 8)
    let mut arr = [0; 8];
    arr.clone_from_slice(&l[0..8]);
    arr.sort_unstable();
    // Iterate through remaining elements of input vector
    // and insert any small enough elements into return vec
    for i in 8..(l.len()) {
        // current element should be inserted somewhere into the
        // return vec
        if &l[i] < &arr[7] {
            // find the correct insertion point (p)
            for p in 0..=7 {
                if &l[i] < &arr[p] {
                    // c is the current index of the 8-element
                    // array as we iterate through it.
                    // if c < p, leave this number alone
                    // if c == p, it is replaced with l[i]
                    // if c > p, move element one place to the right
                    // and remove it if it is the largest element
                    // in the list
                    for c in (p + 1..=7).rev() {
                        arr[c] = arr[c - 1];
                    }
                    arr[p] = l[i]; // replace element at p
                    break;
                }
            }
        }
    }
    arr
}

rust-guru · July 17, 2023, 10:17am

Yes, this is mean-heap algorithm.
In the case of k << n, it works well.

FedericoStra · July 17, 2023, 11:05am

Also, don't rely on the Rust Playground for benchmarks, as it's highly inaccurate. Running your code a few times I've got results ranging from 195 µs to 318 µs. Either run the code locally, or even better use criterion.

jdahlstrom · July 17, 2023, 11:38am

Might also be useful to compare the performance to the std solution, select_nth_unstable, which is based on an introselect algorithm.

riking · July 18, 2023, 11:24pm

Quick modification of @hax10's code to use that.

// SPDX-License-Identifier: Apache-2.0 OR MIT

/// Note: mutates the input.
fn least_8(list: &mut [u32]) -> [u32; 8] {
    let (head, _, _) = list.select_nth_unstable(8 - 1);  // parameter is index, not count
    head.sort_unstable();

    let mut arr = [0; 8];
    arr.clone_from_slice(&list[0..8]);
    arr
}

Have not yet timed it.

system · October 16, 2023, 11:25pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Feedback on how Improve and be more memory efficient code review	7	384	February 1, 2023
Sort: linear time for already sorted list? help	2	149	March 5, 2024
How can I optimize this search routine? help	10	769	April 21, 2019
How do I find the minimum element in a vector help	4	8327	April 22, 2021
How to optmize this code? code review	14	548	September 26, 2023

Implementation of hybrid partial sorting algorithm

Related Topics