Bit twiddling hack

anon80458984 · June 15, 2023, 10:33am

Let x:u64 be viewed as a vector of length 64, where each elem is a bit 0 or 1.

Let k be the number of 1 bits in x.

Let the indices of the 1's be i_0, i_1, ... i_{k-1}.

What is the fastest way, given (c, x) to compute i_c ? Notation here a bit bad, as i is depend on x.

Basically I want to be able to rapidly compute: where does the 1st 1 appear? Where does the 2nd 1 appear? Where does the 3rd 1 appear? ...

SkiFire13 · June 15, 2023, 10:37am

For the first and last you could use leading_zeros and trailing_zeros and do some math to get the actual index. To calculate e.g. the second you could mask off the first 1, and then the second would become the first (which you can efficiently get the index of) and so on.

anon80458984 · June 15, 2023, 10:46am

Do you know how many ops this is? I'm looking for something similar to Bit Twiddling Hacks , but instead of counting bits, calculates the index.

alice · June 15, 2023, 10:51am

Regarding the linked loop, that loop would work if you stop when reaching the count you want. The iteration counter of the loop is your index.

A solution using leading_zeros or trailing_zeros would let you skip several zeros in one iteration (and these are usually single assembly instructions, so they are cheap). Still the running time depends on which bit you want; the 10th bit takes 10 times as many ops as the first bit.

samuelpilz · June 15, 2023, 10:55am

The linked "hack" might be suboptimal. The count_ones function is actually implemented by a single instruction on some architectures (x86). Same with leading_zeros and trailing_zeros.

SkiFire13 · June 15, 2023, 11:03am

The operation of getting the index of the first bit set is just a single instruction (at least on x64), so that's quite fast. However this only works if there is a bit set (i.e. the number is not 0). If it's 0 then the result of the instruction is undefined, so those two methods need to check for that case too, which bloats the assembly a bit. This could however be optimized out if the optimizer sees it is an impossible case, but this depends on your code.

alice · June 15, 2023, 11:06am

Using the count_ones method, it is possible to determine if a given answer is too small or large. This makes it possible to use binary search:

fn find_pos(x: u64, c: u32) -> u64 {
    let mut pos = 0;
    
    let mut pos_bit = 1 << 5;
    while pos_bit != 0 {
        let pos_with_bit = pos | pos_bit;
        let mask = (1 << (pos_with_bit - 1)) - 1;
        let count = (x & mask).count_ones();
        if count <= c {
            pos = pos_with_bit;
        }
        
        pos_bit = pos_bit >> 1;
    }
    pos - 1
}

The loop will iterate six times so you can unroll it, and the if count <= c can probably be made branchless.

user16251 · June 16, 2023, 1:15pm

The bsf and bsr have bad behavior when the input is zero, but tzcnt and lzcnt return the width of the type in bits.

user16251 · June 16, 2023, 1:19pm

Can you provide more context? For problems like this there is no one "fastest" solution. It really depends on how many problems you have to do at once and how the output is formatted. Are you willing to put a big table in L1? Can you use vector instructions? Is just using trailz in a loop fast enough for your application?

scottmcm · June 16, 2023, 9:31pm

Note that this is avoidable by working in NonZeroU64 instead, as that tells LLVM it doesn't need to worry about those cases: https://rust.godbolt.org/z/8Wbh9hdxr

khimru · June 17, 2023, 12:19am

Alternatively you may decide that you don't want to worry about 10+ years old CPUs (Haswell or Piledrier and everything after that support lzcnt/tzcnt and LLVM knows how to use them).

user16251 · June 17, 2023, 1:50am

Rereading your original post you might want to use pdep and pext. If you deposit 1 << c in the mask x it will go to the position of the cth set bit in x which you can then read with tzcnt. If you want to decode bit sets you may find Daniel Lemire's series of articles interesting; here's one post. Of course you need a CPU with pdep or you can implement it by hand with the methods Hank Warren writes about in Hacker's Delight (buy this book!).

system · September 15, 2023, 1:50am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Bit twiddling: n: i32, 2^(i+1) > n >= 2^i => compute i	8	573	October 14, 2022
Get last k bits of a uN	9	1103	June 14, 2022
SIMD-ify the logic help	9	233	March 8, 2025
Effective way of removing the rightmost 1 bit in a u64 value taking care of zero help	15	2338	January 12, 2023
Find first 1 or 0 bit in an unsigned value help	3	191	February 24, 2025

Bit twiddling hack

Related topics