I have compared binary_search before this PR and after, and my own logics.
The PR says that it can process &[usize; 1000] in 15ns and three times faster than old logic. However when i tried same logic, it tooks about 40ns and improvement from the old logic was negligible.
I also cloned the master branch and tried x.py bench, and it tooks 60ns!
Why this happen?
Something is wrong with me? Updates on compiler breaked some criteria?
Environment:
i7-7700K/nightly-x86_64-pc-windows-msvc
or
i7-3770/nightly-x86_64-unknown-linux-gnu
Just to double-check, are you building using --release? Also perhaps the PR author used different optimisation level (or architecture specific ones).
When benchmarking algorithms like sort the input data and starting order is a significant factor. Perhaps instead of comparing your numbers with the PR, compare your numbers using the code before the PR and then after the PR to get something more realistic.
This is issue #53823: Upgrading to LLVM 6 caused a regression in code generation that made the fast binary_search implementation much slower. According to the LLVM issue, this is still not fixed in the latest LLVM.