SIMD-ify the logic

I have the following logic, given an input in binary 0111_1001 and mask 0001_0001, I want to select all consecutive 1 in the input that are higher than or equal to 1 in the mask.

input:   0111 1001
mask:    0001 0001
out:     0111 0001

My questions are as following:

  1. Is there a name for this kind of function that would make it easier to search and ask further questions?
  2. Is there a SIMD-like function and algorithm to speed this function?

I believe you can implement it like this:

type Int = u8;

fn foo(input: Int, mask: Int) -> Int {
    let mut output = input & mask;
    for _ in 0..Int::BITS-1 {
        output |= (output << 1) & input;
    }
    output
}

not sure if you can do better.

1 Like

Broadly? “Bitwise operation”, “bitwise function”, “bit twiddling”… Bit manipulation - Wikipedia may have some useful names and further links.

fn foo(input: u8, mask: u8) -> u8 {
    input & (mask | !input.wrapping_add(input & mask))
}
1 Like

I'd suggest looking at everything mentioned in x86 Bit manipulation instruction set - Wikipedia

2 Likes

Not sure SIMD would help here - "Hacker's Delight" is one classic book recommendation for bit twiddling tricks.

3 Likes

That's a very elegant solution, even if not simdified.

Wish I can mark more as solution.

Btw, do you know of doing it the other way around? I.e. selecting lower bits

input:   0111 1101
mask:    0011 0001
out:     0011 1101

You reverse the bits and use the other method.

Bitwise or bit twiddling as others have pointed out. A absolute classic resource on the subject is Bit Twiddling Hacks (aimed at C, but the principles carry over obviously).

1 Like

On further inspection, this seems to fail for some very obscure cases. Needs more investigation.

It would help if you provided example inputs, outputs and expected outputs of where this fails.

1 Like

To be honest, can't seem to reproduce it right now, but there was a timeout in tests while using the string searcher with function provided by uv2.

I have two functions that I switch between uv2 and my own branchless version. After replacing my own implementation with uv2, a unknown timeout appeared. I wanted to dig deeper, but upon rerunning the tests a few days later it disappeared.