Sorting in Rust

Hello, How to implement sort_by_key function for vector of integers as strings.
For example, v=["5","3","1","8"]
output must be: ["1","3","5","8"]

Your key extraction function will need to parse the numbers:

fn main() {
    let mut v = vec!["5", "3", "1", "8"];
    
    v.sort_by_key(|x| x.parse::<i32>().unwrap());
    
    dbg!(v);
}

It'd be a bit more complex if there were potentially invalid strings in v, though.

@17cupsofcoffee gives a totally valid and useful answer. I'd like to expand on their answer and give some more performance and concept insights.

If we look at the documentation of sort_by_key slice - Rust it suggests that if the sort function is expensive sort_by_cached_key is preferable. That said I would suggest yet a different approach. Generally it's preferable to convert you input into some representation friendly to the kind of operations you want to perform on it. In this case a Vec of i32.

In this case instead of mutating our original input:

fn sort_by_cached_key(vec: &mut Vec<String>) -> usize {
    vec.sort_by_cached_key(|e| e.parse::<i32>().unwrap());

    [...]
}

First convert it into a Vec<i32> and the sort that:

fn parse_then_sort(input: &[String]) -> i32 {
    let mut parsed_vec: Vec<i32> = input.iter()
        .map(|e| e.parse().unwrap())
        .collect();

    parsed_vec.sort();

    [...]
}

There are a couple of upsides to this approach, usually your program doesn't live in a vacuum and gets input from somewhere, let's say from a file or so, a source you usually can't mutate. So you'd have to copy the Vec<String> one time anyway, which you could avoid with an approach like this, because there is no need to mutate the original. Also the random access nature of sort plays much nicer with a continuous memory layout like Vec<i32> than Vec<String>, which should give a considerable speed improvement. There are downsides of course too, a potentially higher memory use if you didn't need to clone the original source of the Vec<String> and the output is not Vec<String>, so if you need pass it along to some other component as Vec<String>, you'd have to convert it again, potentially loosing the time you've gained again. That said if for example you want to do further computation or even print it to stdout Vec<i32> is probably a preferable format to work with.

Curious myself what the speed difference looks like I wrote a small benchmark. Find the source here: sort-string-vec-as-numbers-/main.rs at master · Voultapher/sort-string-vec-as-numbers- · GitHub

The Benchmark results on my laptop:

test sort_by_key_small         ... bench:       1,226 ns/iter (+/- 81)
test sort_by_cached_key_small  ... bench:         530 ns/iter (+/- 24)
test parse_then_sort_small     ... bench:         180 ns/iter (+/- 30)

test sort_by_key_medium        ... bench:     395,200 ns/iter (+/- 67,924)
test sort_by_cached_key_medium ... bench:     102,540 ns/iter (+/- 8,113)
test parse_then_sort_medium    ... bench:      44,679 ns/iter (+/- 3,944)

test sort_by_key_large         ... bench:  71,265,202 ns/iter (+/- 3,781,899)
test sort_by_cached_key_large  ... bench:  18,372,865 ns/iter (+/- 2,124,469)
test parse_then_sort_large     ... bench:   7,318,258 ns/iter (+/- 2,011,615)

To remain fair let's also measure only cloning:

test only_clone_small          ... bench:         278 ns/iter (+/- 28)
test only_clone_medium         ... bench:      43,300 ns/iter (+/- 10,390)
test only_clone_large          ... bench:   4,432,882 ns/iter (+/- 645,157)
3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.