Macro to do binary search

I just tried this ( I hadn't thought of it ). It doesn't appear to do a binary search ( which doesn't mean it's sub-optimal necessarily ). Code:

fn get_off_code( x:u16 ) -> usize
{
  match x
  {
    3..=10 => x as usize - 3,
    11..=12 => 8,
    13..=14 => 9,
    15..=16 => 10,
    17..=18 => 11,
    19..=22 => 12,    
    23..=26 => 13,
    27..=30 => 14,
    31..=34 => 15,
    35..=42 => 16,
    43..=50 => 17,
    51..=58 => 18,
    59..=66 => 19,
    67..=82 => 20,
    83..=98 => 21,
    99..=114 => 22,
    115..=130 => 23,
    131..=162 => 24,   
    163..=194 => 25,
    195..=226 => 26,
    227..=257 => 27,
    _ => 28
  }
}   
1 Like

It’s not exactly a binary search, but on 1.46 (current beta), you can use constexprs to calculate a lookup table at compile time. I haven’t profiled it, but I expect it to be reasonably competitive performance-wise: (Playground)

const fn find_index(x: u8) -> u8 {
    const VALS: &[u8] = &[11, 13, 15, 17, 19, 23, 27, 31];
    const MAX: u8 = VALS[VALS.len() - 1];
    const LUT: [u8; MAX as usize] = {
        let mut lut: [u8; MAX as usize] = [0; MAX as usize];
        let mut i_val: usize = 0;
        let mut i_lut: usize = 0;
        while i_lut < lut.len() {
            while VALS[i_val + 1] <= i_lut as u8 {
                i_val += 1;
            }
            lut[i_lut] = i_val as u8;
            i_lut += 1;
        }
        lut
    };
    if x as usize >= LUT.len() {
        (VALS.len() - 1) as u8
    } else {
        LUT[x as usize]
    }
}
1 Like

I did a quick test, comparing match to binary search. Binary search was marginally faster ( 29 msec for 10 million iterations versus 31 msec ).

Playground.

I had no idea you could do that, cool!

I updated your playground link to include the const LUT, and it appears significantly faster (Playground):

time elapsed=28 milli sec.
time elapsed=26 milli sec.
time elapsed=4 milli sec.
t1=176250000 t2=176250000 t3=176250000

This is a pretty ideal scenario for it, though: the entire table fits in 5 cache lines and is unlikely to be evicted in such a tight loop.

2 Likes

Correct me if I'm wrong, but shouldn't the compiler inline the result of the const fn, eliminating the lookup table completely? At least while the parameter supplied is also const.

Yes, if the argument is const it will simply inline the result. In this case, I don’t know if the optimizer unrolled the loop enough to do that.

I think this will be hard to beat. I do have another list, where the lookup table might be hogging too much memory:

1,2,3,4, 5,7,9,13, 17,25,33,49, 65,97,129,193, 257,385,513,769, 1025,1537,2049,3073, 4097,6145,8193,12289, 16385,24577

[ These numbers relate to RFC 1951 - DEFLATE Compressed Data Format ]

A plain lookup table for that one could be cumbersome for memory footprint, but you might be able to come up with a hybrid approach that works well— Something like a lookup table for 0-255 and a search for the rest.

With such short lists, a sequential scan might be faster than a binary search: it has less potential to mess up the processor’s branch predictor.

It may also be possible to structure the problem so that SIMD can be applied, working on multiple comparison values in parallel.

1 Like

For the second list ( 1,2,3,4, 5,7,9,13, 17,25,33,49, 65,97,129,193, 257,385,513,769, 1025,1537,2049,3073, 4097,6145,8193,12289, 16385,24577) , based on regularities in the list of numbers, I came up with this:

fn get_dist_code( mut x0:u16 ) -> usize
{
  x0 -= 1;
  let x = x0 as usize;
  let mut bits = 16 - x0.leading_zeros() as usize;
  if bits < 3 { 
    x 
  } else { 
    bits -= 1;
    ( x + ( ( bits << 1 ) << ( bits - 1 ) ) - ( 1 << bits ) ) >> ( bits - 1 ) 
  }
}
1 Like

All rust macros can do this fairly easily, in fact as easily as a function can, by emulating that function's body.

However, the First Rule of macrology states: "Never use a macro when a function will do". The reasons for that are:

  1. Macro's don't compose as well as functions
  2. Sometimes people think of macros to avoid function calling overhead. The response to this is that it's probably premature optimization. But even if it's not, function calling is cheap, and if it's still too expensive then an inlined function is almost always the superior choice.
  3. Macros aren't typed like functions are, meaning that generally macro inputs and outputs are not typechecked as such (though the code generated by the macro is typechecked of course). Functions provide safety through that feature.
  4. Macros are also more difficult to write, comprehend and debug than functions. For example, using a debugger to debug a macro is like using a neutered WebView (think a browser UI widget without any fancy features like dev tools) to debug a web client: a lot of information is lost in the deployment process, and you can't really access that while trying to use the debugger / WebView.

The better solution here seems to be using an iterative function or method that does the binary search.

So what are syntactic macros useful for?

  1. Cutting boilerplate code. If you see a code pattern repeating over and over, and it can't be captured by a function then a macro is a good fit. This is especially the case when the code pattern involves control flow, which cannot be captured by functions.
  2. Compile time computation. Basically you compute a bunch of stuff when the macro is executing (I.e. macro-expansion time, which is during compilation), and then use the results in some form in the generated binary.
  3. Code generation. For example if you need to compute some types (not values of those types, but the types themselves), impl blocks (which happens often with derive macros), functions, methods etc then you'll almost immediately end up in macro land.
2 Likes

I haven’t tested the performance, but you could also try using a BTreeMap:

use lazy_static::lazy_static;
use std::borrow::Borrow;
use std::collections::BTreeMap;
use std::convert::TryInto;

fn make_table<T: Ord, Ix, E>(v: impl IntoIterator<Item = T>) -> Result<BTreeMap<T, Ix>, E>
where
    usize: TryInto<Ix, Error = E>,
{
    v.into_iter()
        .enumerate()
        .map(|(i, x)| Ok((x, i.try_into()?)))
        .collect()
}

fn find_index<'a, T, Ix, Q>(k: &Q, table: &'a BTreeMap<T, Ix>) -> Option<&'a Ix>
where
    T: Borrow<Q> + Ord,
    Q: Ord,
{
    table.range(..=k).next_back().map(|(_, ix)| ix)
}

lazy_static! {
    static ref TABLE1: BTreeMap<u8, u8> =
        make_table([11, 13, 15, 17, 19, 23, 27, 31].iter().copied()).unwrap();
}

fn find_ix1(x: u8) -> u8 {
    *find_index(&x, &TABLE1).unwrap()
}

lazy_static! {
    static ref TABLE2: BTreeMap<u16, u8> = make_table(
        [
            1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 257, 385, 513, 769, 1025,
            1537, 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577
        ]
        .iter()
        .copied()
    )
    .unwrap();
}

fn find_ix2(x: u16) -> u8 {
    *find_index(&x, &TABLE2).unwrap()
}

fn main() {
    for &x in &[11, 12, 13, 14, 15, 29, 30, 31] {
        println!("{} has ix {}", x, find_ix1(x));
    }
    println!("--------------");
    for &x in &[1,2,1000,5000,10000,24576,24577] {
        println!("{} has ix {}", x, find_ix2(x));
    }
    
    /* OUTPUT:
    
    11 has ix 0
    12 has ix 0
    13 has ix 1
    14 has ix 1
    15 has ix 2
    29 has ix 6
    30 has ix 6
    31 has ix 7
    --------------
    1 has ix 0
    2 has ix 1
    1000 has ix 19
    5000 has ix 24
    10000 has ix 26
    24576 has ix 28
    24577 has ix 29
    
    */
}

I am a bit surprised that my arithmetic method on the second list seems to come out slower than binary search ( by about 50% ). I haven't figured out why yet, the assembly instructions look fairly ok to me, just goes to show it's very hard to predict what will be fast. Can anyone explain it?

Playground

Edit: I got it to go a bit faster, but still about 20% slower than standard binary search:

fn get_dist_code( mut x0:u16 ) -> usize
{
  x0 -= 1;
  let x = x0 as usize;
  if x < 5 { x }
  else
  {
    let bits = 15 - x0.leading_zeros() as usize;
    ( x + ( ( ( bits << 1 ) - 2 ) << ( bits - 1 ) ) ) >> ( bits - 1 ) 
  }
}

Revised playground

Here’s a macro:

macro_rules! find_index {
    ([$($n:expr),* $(,)?], $x:ident $(,)?) => {
        find_index!(@1[][$(($n))*]0$x)
    };
    (@1[$($t:tt)*][$n:tt $($ns:tt)*]$e:tt$x:tt) => {
        find_index!(@1[$($t)* [$n $e]][$($ns)*]($e+1)$x)
    };
    (@1[$($t:tt)*][]$n:tt$x:tt) => {
        find_index!(@2[][$($t)*]$x)
    };
    (@2[][[$n1:tt $e1:tt]]$x:tt) => {
        $e1
    };
    (@2[$($t:tt)*][[$n1:tt $e1:tt][$n2:tt $e2:tt]$($ne:tt)*]$x:tt) => {
        find_index!(@2[$($t)*[$n1 (if $x < $n2 {$e1} else {$e2})]][$($ne)*]$x)
    };
    (@2[$($t:tt)*][[$n1:tt $e1:tt]]$x:tt) => {
        find_index!(@2[$($t)*[$n1 $e1]][]$x)
    };
    (@2[$($t:tt)*][]$x:tt) => {
        find_index!(@2[][$($t)*]$x)
    };
}

fn find_ix(x: u8) -> u8 {
    find_index!([11, 13, 15, 17, 19, 23, 27, 31], x)
}
/* EXPANDED:
fn find_ix(x: u8) -> u8 {
    (if x < (19) {
        (if x < (15) {
            (if x < (13) { 0 } else { (0 + 1) })
        } else {
            (if x < (17) {
                ((0 + 1) + 1)
            } else {
                (((0 + 1) + 1) + 1)
            })
        })
    } else {
        (if x < (27) {
            (if x < (23) {
                ((((0 + 1) + 1) + 1) + 1)
            } else {
                (((((0 + 1) + 1) + 1) + 1) + 1)
            })
        } else {
            (if x < (31) {
                ((((((0 + 1) + 1) + 1) + 1) + 1) + 1)
            } else {
                (((((((0 + 1) + 1) + 1) + 1) + 1) + 1) + 1)
            })
        })
    })
}
*/

fn main() {
    for &x in &[11, 12, 13, 14, 15, 29, 30, 31] {
        println!("{} has ix {}", x, find_ix(x));
    }
    /* OUTPUT:
    
    11 has ix 0
    12 has ix 0
    13 has ix 1
    14 has ix 1
    15 has ix 2
    29 has ix 6
    30 has ix 6
    31 has ix 7
    
    */
}

Edit: Fixed a bug in the macro, improved a few variable names in the macro.

4 Likes

Perfect!

Amazing job! I just added it to the test for my second list, and it does appear to be significantly faster ( 26 msec vs. 43 msec for the standard binary search ).

Playground link

1 Like

Here's one that includes an implementation with match:

Not that I would expect match to generate the absolutely optimal possible code, but I'm somewhat disappointed how slow it is compared to all the other options.

Can const generics be arrays? I wonder if a "template" style #[inline] generic function taking a const array would be easier to read while producing identical specialized code.

Something like

fn find_index<const N: usize, const A: [u8; N]>
  (_: const A)-> impl const Fn(u8)->u8 {
    if let [] = A {
        return |_| 0;
    };
    const mid = N/2
    const find_head = find_index(A[..mid])
    const find_tail = find_index(A[mid..])
    |n: u8| {
        if n < A[mid] { find_head(n) }
        else { mid + find_tail(n) }
    }
}

(Possibly with another "current index" const parameter if the inliner can't deal with the "non-tail recursion" 4 + if{}else{2+…} and you want macro-like 0 + 1 + 1 + 1 leaf nodes)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.