Binary search implementation vs documentation


#1

I am using the slice binary search in a small project, but I require that it returns the last index of the matching elements. Checking the stdlib src I found that this is actually the implemented behaviour. But the docs state that it could return any of the matching positions.

I also browsed trough this PR https://github.com/rust-lang/rust/pull/45333 and noticed there’s even this test that checks this specific functionality:
https://github.com/rust-lang/rust/blob/master/src/libcore/tests/slice.rs#L72

#[test]
// Test implementation specific behavior when finding equivalent elements.
// It is ok to break this test but when you do a crater run is highly advisable.
fn test_binary_search_implementation_details() {
    let b = [1, 1, 2, 2, 3, 3, 3];
    assert_eq!(b.binary_search(&1), Ok(1));
    assert_eq!(b.binary_search(&2), Ok(3));
    assert_eq!(b.binary_search(&3), Ok(6));
    let b = [1, 1, 1, 1, 1, 3, 3, 3, 3];
    assert_eq!(b.binary_search(&1), Ok(4));
    assert_eq!(b.binary_search(&3), Ok(8));
    let b = [1, 1, 1, 1, 3, 3, 3, 3, 3];
    assert_eq!(b.binary_search(&1), Ok(3));
    assert_eq!(b.binary_search(&3), Ok(8));
}

The comment basically indicates that if someone relies on this functionality (assuming it’s tests would catch it) it would be considered required and not change it?

Why not change the documentation instead and commit to it?

Knowing this, should I just add a test that checks this behaviour and perhaps fail on later rust versions, copy the current implementation into my src (unsafe calls), or open an issue and try to force the stdlib to pick one of the behaviours and adjust docs accordingly.


#2

Not involved, but I assume that the motivation is to avoid precluding optimizations. If you make no guarantee about which element is returned, you can exit the search earlier and more easily parallelize it (including via vectorization).


#3

Returning early requires an additional equality test in the inner loop. It would only benefit some scenarios. Also vectorization would only be possible for some primitive types, but not for the generic Slice<T>?

I copied the test into my project. We’ll see how long it will last :wink:


#4

https://crates.io/crates/superslice from the person that made the code in std


#5

The crater run mentioned in the comment would be a proxy for how big the breakage of a proposed change would be - that, in turn, would influence how much scrutiny the change will receive. Even though the behavior may not be guaranteed, there has to be strong motivation to break existing code. But it’s not a guarantee either :slight_smile:. Basically, it implies that the behavior may change but (presumably) there will be a very good reason for why that happened.

And to be clear: that’s my interpretation, I in no way represent std maintainers’ opinion.