How can I optimize this search routine?


I’m making a datastructure (I called it SparseArraySparseModeData) to efficiently store a sparse vec. What it does is that it stores both the address, and the value of each element. To get the value at an index, it runs a hybrid binary-interpolation search on the vector of addresses. It’s slow, though - the “hot” function is called find_address, which does the searching. The address vector is assumed to have 4 sections, each of which is piecewise linear. Any ideas on how to optimize it (just “for fun”)? The code is here.




Your playground is very unfriendly for testing.
Let’s please take a few steps back and do it properly.

  • First clean up the playground and remove everything unrelated (I managed to strip it down to ~140LOC).
  • Next, add test cases where on can assert, that one don’t change anything unintended (we don’t know the topic as well as you do, so we might screw things up).
  • Last, add benchmarks so, we can proplery tell if we managed to do something.

Only this way we can really help you. Also this will help you to write maintainable and reliable code.

Also, please use rustfmt to properly format the code, add empty lines between two functions and don’t sprinkle everything with inline. The compiler will automatically decide if it’s worth inlining. The compiler is probably smarter than you (and me :wink: ).



There is something wrong with the order of the operations in the interpolation. Setting loading=0 and some fixes I reduce the ratio to 4.



yeah, loading=0 is just pure interpolation search



What’s your estimate for the maximum size of an array such that binary search is faster than interpolation search (if the array is small enough, the fewer comparisons and less expensive arithmetic would make binary search faster, even though it only removes half of the array per iteration)?



I made a program to empirically determine loading. Looks like 0 is really the best!



That seems right, at least for the average case. There must be some lists for which at some indices it works better binary search than interpolation. Depending the goal it could be useful to try to improve those worst cases.

I was thinking that maybe for small slices it could be better to just check all locations. Have looked upon this? I have run a few tests and it is not clear.

EDIT: Of course that interpolation is best and there is not difference with smaller threshold. If the example list is a list with constant stride then interpolation needs just one iteration to reach the solution. The situation is different if we use a random list as input, where strides can be different:



No follow-up to your last thread? I and several others asked many questions about your problem size and use case, and a number of us shared the sentiment that splitting your array was unlikely to solve anything regardless of the circumstances. I feel like we were ignored, and that makes it discouraging to try to provide further help.

1 Like


Sorry about that! I’ll follow up as soon as possible (on mobile so typing’s slow)



I replaced the final binary search with linear search, and with loading = 50, I got a ratio of about 2 - code is here


closed #11

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.