I have a sorted immutable list of u32 (address like a.b.c.d, variety in a is smaller), usually it is < 500 elements, but can be up to 5000. I need to query the index of a given u32 extremely often on a cortex-m4 machine with at most 200KB RAM to spare. The good part is, before submitting to that small machine, I can do any precomputation on a x86_64 machine, so basically unlimited amount of memory and time. What can I use? Something like binary search may be a good initial step, but I fear the latency would be high due to lots of random memory accesses.
A vague feeling does not sound like a solid basis for a decision...
The worst case access time for a binary search could be easily calculated or measured, especially on an embedded system.
In many cases, the access pattern and/or distribution of the values can be somewhat exploited. Sometimes a plain dead simple LRU cache works well.
You can use a B-tree to exchange random access into sequential access. But you have to calculate/measure anyway to choose the optimal node size for your data and hardware.
If you really need the index into the sorted list, then yes binary search is absolutely the way to go. It'll probably be fine without you doing any extra work on it; especially with a static table it'll just unroll the whole search.
If you want slightly better cache usage, you could try Eytzinger Binary Search - Algorithmica instead, which lays out the elements differently. But it probably won't much matter.
Do you have any other expectations for the lookups? Will you get a ton of misses, for example?
I have not written about, but its obvious that if you store some static, pre-computed B-Tree, you don't store any pointers/offsets et all, because the positions of every node is known.
It's like a plain sorted array: Every position is known, no need for pointers/offsets.