So I thought about this some time and I implemented a more cache friendly search (in sorted slices).
The code can be found here.
The basic idea is that during a binary search in (big) sorted slices the cpu is mainly waiting for the next value to compare to and therefore wasting time.
This can be improved by asking the cpu to prefetch the data and during that prefetching start searching for the next values.
I implemented this concept in three ways:
- a handcrafted state machien
- using futures
- using generators
I personally prefer the two versions using generators as they stay extremely close to the original (the binary search from the
std) while greatly improving performance (on big slices). Sadly generators cannot be used yet in stable...
I am not looking for a code review but mainly wanted to share it and ask for your opinions or try to answer questions you have.
I'm curious how it compares to Eytzinger Binary Search, which takes a completely different approach to defining a cache-friendly algorithm.
As far as I understand it, the text you linked does not prefetch the data for the immediatly following comparison, but for a comparison down the line.
Either way, here are my thoughts after flying over the link:
- inserting or removing values might require rebuilding the whole array in a pattern that is far more complicated than a
memcpy (what a array would use for insertion or deletion), thus making the Eytzinger representation not suitable for many situations
- The text is only trying to speed up a single search, so the setting is a bit different. If I tried to speed up a single search I would have to prefetch multiple locations in the array beforhand without knowing which of them will be required, which would waste a lot of memory bandwith.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.