Why does rust use mergesort as slice.sort() in standard library?

I didn't measure the performance. My friend who is java developer just told me about Timsort which is improved merge sort and shown me source code of it in JDK. I opened Rust src and found Rust uses merge sort?

Is it because nobody implemented more advanced algorithm yet?

Then I suggest you to measure it, first.

1 Like

@mholub I believe it's because nobody has cared to optimize it further yet. Rust actually once had a timsort implementation, but it was a fairly complex piece of code that hadn't been carefully vetted and nobody felt confident about. So at some point during the pre-1.0 library cleanup @huon created the current merge sort and nobody has touched it since.

If a clean implementation of timsort showed up in a PR I imagine reviewers would be interested.

1 Like

The motivation for merge sort as compared to something like timsort is it is relatively simple to implement compared to timsort, and is compares well since that simplicity lends itself to unsafe code. You can see a comparison with the old timsort in https://github.com/rust-lang/rust/pull/11064#issuecomment-30979819, the current sort is likely to be closest to the trait column.

2 Likes

How does mergesort compare to qsort?

Quick sort has a O(n²) worst case performance but can be implemented to avoid allocations (there's a variant that only needs O(log(n)) space so it can be done on the stack). See the Wikipedia article for a comparison of just about every sort algorithm:

1 Like

The biggest problem I have with mergesort being the default is it allocates.

The C++ std::sort() algorithm usually introsort. Mergesort is commonly used for C++ std::stable_sort().

I wrote a Rust implementation of introsort, at the time I benchmarked it, it was always faster than the default Rust implementation. It's on github at GitHub - bitshifter/sortrs: Introspection sort implementation in Rust. There are 3 introsort implementations on crates.io https://crates.io/search?q=introsort including mine.

If you don't need a stable sort, introsort seems faster and is in-place.

Also there's an RFC about adding a non-allocating sort to core Rust here. A comment on there mentions that pattern defeating quicksort is faster than introsort, I'm not aware of any Rust implementation of that though.

2 Likes

Have you tried types with much more complex comparison? Mergesort tends to optimize the number of comparisons, so if you have benchmarked against a vector of primitive values quicksort would be almost unanimously faster (which optimizes the number of movements). Probably specialization may be used to switch to the three-way quicksort or similar on such cases in the future.

1 Like

Have you tried types with much more complex comparison?

Not terribly complex, one is sorting a tuple of 4 u64, so it's a large key. My Introsort is faster for 10000 of those elements, I'm not currently benchmarking more than that.

timsort was "the best" in Python, where we can loosely characterize the situation when sorting a python list as: 1) Element comparisons are expensive (python function call), 2) Element swaps/moves are very cheap (always pointer sized).

Neither of (1) nor (2) need to be the case in Rust.

4 Likes

If default type parameters are added to functions, we could pass in the sorting algorithm using a type, similar to the hashing type passed to HashMap.

4 Likes

Just as a FYI and in case others end here when searching for Rust sort algorithms: The sort implementation has been rewritten in the newly released Rust 1.15. There are benchmarks and a full description here:

https://github.com/rust-lang/rust/pull/38192

The release notes describe the new algorithm as a hybrid merge sort which takes inspiration from the venerable Timsort from Python.

4 Likes