How to reduce usage of unsafe in partial_sort crate?

I found there is no std::partial_sort function in Rust, but I really need that. So I create one from standard c++ implementation.

partial_sort - crates.io: Rust Package Registry

But I need some advice to reduce the usage of unsafe blocks. I'm new to rust, thanks.

I'm not super familiar with C++ STL, but isn't partial_sort just nth_element + sort on the initial subslice?

(Admittedly select_nth_unstable was only stabilized in 1.49, so it's understandable if it escaped your notice.)

1 Like

But it did not make sure T[0,n) is sorted.

let mut v = [-5i32, 4, 1, -3, 2];

// Find the median
v.select_nth_unstable(2);

// We are only guaranteed the slice will be one of the following, based on the way we sort
// about the specified index.
assert!(v == [-3, -5, 1, 2, 4] ||
        v == [-5, -3, 1, 2, 4] ||
        v == [-3, -5, 1, 4, 2] ||
        v == [-5, -3, 1, 4, 2]);

That's why @trentj said + sort.

pub fn partial_sort<T, F: FnMut(&T, &T) -> Ordering>(target: &mut [T], idx: usize, mut cmp: F) {
    let (before, _mid, _after) = target.select_nth_unstable_by(idx, &mut cmp);
    before.sort_unstable_by(cmp);
}
2 Likes

Regarding your code, I feel a little bad about just shooting down your implementation out of the gate, so here are some (hopefully constructive) comments.

I do think there is some unsafe that doesn't look necessary. For instance, sort_heap is an unsafe fn, which seems unnecessary (maybe a holdover from earlier versions?). Removing unsafe from sort_heap lets you shrink the scope of unsafe in partial_sort. Speaking of partial_sort, I think the effort you go to

unsafe {
    for i in last..v.len() {
        if is_less(v.get_unchecked(i), v.get_unchecked(0)) {
            v.swap(0, i);
            ...

to avoid bounds checks is probably not pulling its weight here, since v.swap will check them anyway (there's no swap_unchecked) and additional checks against the same bounds are very likely to be optimized away. I'd suspect the following is no slower:

for x in &v[last..] {
    if is_less(x, &v[0]) {
        ...

In adjust_heap almost the entire function body is wrapped in a single unsafe block that isn't marked with any comments about safety, despite the fact that adjust_heap itself is a safe function, so it's hard to really audit effectively because it's all tightly coupled.

There is some ambiguity about the "proper" scope for unsafe. Some people feel that you should put an unsafe block around all the closely related code that can easily lead to unsafety. I tend to think that unsafe should normally be about as small as the compiler will allow, which lets you more easily audit one block at a time. Four one-line unsafe blocks is better, in my opinion, than one big one, since those four little ones almost certainly have slightly different reasons why they are really safe. But people can reasonably differ on this subject.

One more thing about adjust_heap is that it actually should be marked unsafe because it's possible to pass a len that's larger than v.len() and cause unsafety with it. Yeah, it's an internal API, but it should be possible to make adjust_heap safe to call, so I recommend doing that.

I'm dozing off, so I have to go to sleep, but hopefully this helps some!

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.